

Faculty of Electronic Engineering Niš, Serbia



organized by the Innovation Centre of Advanced Technologies and the Faculty of Electronic Engineering Niš

# **Publisher:**

Faculty of Electronic Engineering, Niš P.O. Box 73, 18000 Niš Serbia http://www.elfak.ni.ac.rs

and

Innovation Center of Advanced Technologies Vojvode Mišića 58/2 18000 Niš http://www.icnt.rs/kontakt/kontakt.php

# **Editor:**

Vančo Litovski

CIP – Каталогизација у публикацији Народна Библиотека Србије, Београд

519.876.5(082) 004.942(082)

SMALL Systems Simulation Symposium (5; 2014; Niš)
5th Proceedings of the Small Systems Simulation Symposium, 2014, February 12-14, Niš, Serbia / organized by The Faculty of Electronic Engineering and The Innovation Center of Advanced Technologies; [editor Vančo Litovski].
Niš: Faculty of Electronic Engineering, 2014 (Niš : Unigraf). - 138 str. : ilustr. ; 30 cm

Tekst štampan dvostubačno. - Tiraž 50. – Bibliografija uz svaki rad. - Registar.

ISBN 978-86-6125-098-9 1. Faculty of Electronic Engineering (Niš) 2. Innovation centre of Advanced Technologies (Niš) а) Симулација – Зборници COBISS.SR-ID 204680460

Printed by: "Unigraf", Niš

# Preface

# To the Proceedings of the 5th Small Systems Simulation Symposium

# Dear colleagues, dear friends, dear guests,

It is now 14 years since the First Small Systems Simulation Symposium took place here at the Faculty of Electronic Engineering of the University of Niš in the year 2000. That were difficult times for Serbia and for all of us but thanks to the enthusiasm of the staff of our Laboratory for Electronic Design Automation (LEDA) and our foreign friends, we succeeded to establish an international meeting that lasts. This is why I want to stress here the names of some of our friends from abroad that decisively contributed to the development of SSSS: Robert Ivan Damper, Mark Zwolinski, and Tomasz Kazmierski from the University of Southampton, Ebrahim Busheri from the Middlesex University, Michel Lenzner from the University of Besancon, Vazgen Melikyan from the University of Erevan, Octavio Nieto and Slobodan Bojanić from Univrsidad Politecnica de Madrid, and last but not least, Volker Zerbe from the Technical University of Erfurt.

During all these years we were trying to promote our research results and especially the results of our young associates in an environment and atmosphere that was supposed to be correctly critical and, in the same time, encouraging enough for further endeavors.

SSSS became a place to review the two years efforts of all of us, to see the actual results, and to program future research. It was a place of meeting professionals from different part of the world get together around the challenge of simulation which will always impose new and more complex tasks to a designer and scholar.

SSSS, while dedicated to simulation, was, is, and I hope, will be, also a platform enabling convergence of design ideas, methods, and tools that are supported by simulation methods. That gave to it a broad audience and as we all know became a place to report results of many international collaborative efforts and projects.

As for the content of this issue, you will find nothing but the same as described above. We still have a strong international participation with new names emerging. Here we especially want to note our colleagues from Bulgaria, Prof. M. Marinov, from Macedonia, Prof. D. Trajanov, and from B&H, Prof. B. Dokić. What is new, and we think of prime importance, we started to gather colleagues not only from the academia but also from the European and domestic industy and research centres.

It is importante to note that the Yugoslave Simulation Society that was the first initiator of the SSSS does not exists any longer and we are thankful to the Innovation Centre of Advanced Technologies, a private owned research institute from Niš, that took the responsibility to take care for the symposium in the future.

Respectfuly

Prof. Litovski

Ванго Линовски

The 5<sup>th</sup> Small Systems Simulation Symposium

was supported by

Ministry of Education, Science, and Technological Development of Serbia,

Rohde & Schwarz, Serbia

The Faculty of Technical Sciences, Čačak, Serbia

and

Ratel, Republic Agency for Electronic Communications, Serbia

# STEERING COMMITEE OF THE SSSS2014

- A. Belić, Institute of Physics, Belgrade (Serbia)
- S. Bojanić, Universidad politécnica de Madrid (Spain)
- **B. Blagojević**, University of Niš, Serbia
- I. Bushehri, Lime Microsystems, (United Kingdom)
- M. Jevtić, University of Niš (Serbia)
- B. Damper, University of Southampton
- **B. Dokić**, University of Banja Luka (Bosnia and Herzegovina)
- G. S. Djordjević, University of Niš (Serbia)
- **N. Janković**, University of Niš (Serbia)
- T. Kazmierski, University od Southampton
- V. Litovski, NiCAT cluster (Serbia)
- **O. Nieto**, Universidad politécnica de Madrid (Spain)
- D. Pantić, University Niš (Serbia)
- **B. Reljin**, University of Belgrade (Serbia)
- S. Milenković, Lime microsystems (United Kingdom)
- P. Petković, University of Niš (Serbia)
- M. Smiljanić, University of Belgrade (Serbia)
- S. Stanković, University of Belgrade (Serbia)
- **D. Trajanov**, FCSE (FINKI) Skopje (Macedonia)
- V. Zerbe, Technical University of Ilmenau (Germany)
- M. Zwolinski, University of Southampton (United Kingdom)

# **ORGANIZING COMITEE of the SSSS2014**

|   | M. Andrejević Stošović, University of Niš (Serbia)    |
|---|-------------------------------------------------------|
|   | S. Bojanić, Universidad Politecnica de Madrid (Spain) |
|   | M. Dimitrijević, University of Niš (Serbia)           |
|   | S. Đorđević, University of Niš (Serbia)               |
| - | B. Jovanović, University of Niš (Serbia)              |
|   | V. Litovski, University of Niš (Serbia)               |
| 1 | M. Milić, University of Niš (Serbia)                  |
|   | J. Milojković, ICNT Niš (Serbia)                      |
| 3 | D. Milovanović, University of Niš (Serbia)            |
|   | D. Mirković, University of Niš (Serbia)               |
|   | P. Petković, University of Niš (Serbia) / President   |
|   | Z. Petković, University of Niš (Serbia)               |

# SYMPOSIUM SECRETARY

Marko Dimitrijević

Faculty of Electronic Engineering Aleksandra Medvedeva 14 18000 Niš Serbia Tel: +381 18 529321 marko@venus.elfak.ni.ac.rs

# **Proceedings of**

# The 5th Small Systems Simulation Symposium

Niš, Faculty of Electronic Engineering, 12-14 February, 2014

# Contents

- 1.1 **Dušan Krčum, Dušan Grujić, Milan Savić, and Lazar Saranovac**, "Behavioural Simulation of 60 GHz FMCW Radar Using CppSim Simulator", 11-15
- 2.1 Vazgen Melikyan, Arthur Sahakyan S., Svetlana Poghosyan M., Armen Sahakyan S., Vahe Babayan, "High PSRR Gain-Boosted Rail-To-Rail OTA", 16-19
- 2.2 Vazgen Melikyan, Hayk Dingchyan, Artak Hayrapetyan, Arthur Sahakyan, Vardan Grigoryants, Vahe Babayan and Ashot Martirosyan, "The Correlated Level Shifting as a Gain Enhancement Technique for Comparator Based Integrators", 20-23
- 2.3 Jelena Mišić, Vera Marković "Wideband Low Noise Amplifier for Long Term Evolution Systems", 24-29
- 2.4 **Miljana Milić and Vančo Litovski** "Testing Capacitors' Hard Defects in Notch SC Filters Using the Oscillation Method", 30-36
- 2.5 **Dejan Mirković, Predrag Petković and Dragiša Milovanović**, "Analog Design Challenges in Advanced CMOS Process Node", 37-42
- 2.6 **Miona Andrejević Stošović, Marko Dimitrijević, and Vančo Litovski** *"SPICE Model of a Linear Variable Capacitance"*, 43-46
- 2.7 Jugoslav Joković, Nebojša Dončov, Bratislav Milovanović and Tijana Dimitrijević, "Analysis of Outdoor Emissions from Printed Circuit Board Enclosed in Metallic Box with Aperture", 47-50
- 3.1 Mariya Spasova, George Angelov, Anna Andonova, Tihomir Takov, and Marin Hristov "Analysis of Electronic Structure of Carbon Nanotubes", 51-54
- 3.2 **Dušan Grujić, Pavle Jovanović, Dušan Krčum, and Milan Savić**, *"RFIC Passive Component Design and Simulation in Python"*, 55-58
- 3.3 Sanja Aleksić, Danijela Pantić, and Dragan Pantić, "Analysis and Simulation of the Impact of Traps Generated at Si/SiO2 Interface and in Semiconductor Bulk Due to HEFS on MOSFET's Electrical Characteristics", 59-64
- 3.4 **Leonid Djinevski, Sonja Filiposka, Igor Mishkovski and Dimitar Trajanov**, "Wireless Ad Hoc Network Simulation in Cloud Environment", 65-68
- 3.5 **Dragan Drača, Aleksandra Panajotović, and Nikola Sekulović**, "Simulation of Dynamic Characteristic of L-branch Selection Combining Diversity Receiver in Nakagami-m Environment", 69-73

- 3.6 **Dejan Stevanović, Predrag Petković, and Volker Zerbe**, "Single Phase System for Detection of Harmonic Pollution Sources at Power Grid", 74-78
- 3.7 **Zoran Petrušić, and Andrija Petrušić,** "Application of a Bidirectional Electricity Meter in the 5kw Grid-Connected Photovoltaic Power Plant", 79-84
- 3.8 **Goran S. Đorđević, Miloš Petković, Darko Todorović,** "Natural Patterns for Design and Control of Bi-Whegs in Quadruped Robot", 85-90
- 3.9 Aram Baghdasaryan, "Parallel Circuit Simulation on Graphical Processing Unit", 91-94
- 4.1 **Tom Kazmierski, and Charles Leech,** *"Synthesis of Application Specific Processor Architectures for Ultra-Low Energy Consumption"*, 95-101
- 4.2 Vladimir Petrović, Marko Ilić, Gunter Schoof, and Zoran Stamenković, "New Fault Tolerant Design Methodology Applied to Middleware Switch Processor", Invited lecture, 102-107
- 4.3 Vladimir Zdraveski, Andrej Dimitrovski and Dimitar Trajanov, "HDL IP Cores System as an Online Testbench Provider", 108-112
- 4.4 Marko Dimitrijević, Miona Andrejević Stošović, Octavio Nieto, Slobodan Bojanić, and Vančo Litovski, "Computer Workstation Vetting by Supply Current Monitoring", 113-118
- 4.5 **Borisav Jovanović, Milunka Damnjanović**, "Glitch Free Clock Switching Techniques in Modern Microcontrollers", 119-122
- 4.6 **Milena Stanojlović, Vančo Litovski and Predrag Petković**, *"Testing an SCA hardened combinational standard cell preliminary considerations"*, 123-128
- 4.7 **Ana Krkljić, Branko Dokić and Velibor Škobić**, "FPGA Implementation of AES Algorithm", 129-131
- 4.8 **Velibor Škobić, Branko Dokić and Željko Ivanović**, "FPGA Implementation of Montgomery Modular Multiplier", 132-136

List of authors, 137

# Behavioral Simulation of 60 GHz FMCW Radar using CppSim Simulator

Dušan Krčum, Dušan Grujić, Milan Savić, and Lazar Saranovac

Abstract - 60 GHz FMCW radar system modelling using CppSim is presented in this paper. Modelling methodology of fundamental radar blocks is discussed. Models are validated in several real scenarios. Simulation results can be used as guideline in design and usage of modern radars in millimeter frequency band.

Keywords - Behavioral simulation, CppSim, Radars.

#### I. INTRODUCTION

Radar systems now days are becoming widely used in automotive, healthcare and security applications, [1,2,3,4]. Like communication systems, radar systems are very sensitive to noise, intermodulation distortion and other nonlinearities, whose influence on the system performance can't be fully determined analytically. System blocks have to be analysed separately in order to get deep insight in their impact on system performance. Particular attention should be paid on the non-linear characteristic of voltage controlled oscillator (VCO), its phase noise, distortion coming from amplifiers, quadrature signals mismatch in phase and amplitude, and Tx - Rx leakage. One of the major challenges in modern radar design is to provide inexpensive fast and reliable system level simulation, [5, 6, 7, 8]. The aim of this paper is to present usage of free general-purpose time domain simulator, CppSim, for simulation of FMCW radar. CppSim simulations were done in order to discover the possibilities and limitations in design and usage of modern radar systems. CppSim is not adapted to the simulation of system like radars, so it was necessary to develop suitable models of building blocks, radar channel, stationary and moving targets. The simulator works in the time domain, and supports transient noise. Exact system analysis requires insight in the time and frequency domain, and needs to pay attention to the digital signal processing using CppSim or external tools such as Octave and Python.

## II. FMCW Radar

Block diagram of FMCW radar is presented on Fig. 1. Fundamental blocks of FMCW radar are voltage controlled oscillator, power amplifier, LNA, mixer, and low pass filter. FMCW radar uses time variable transmit frequency in order to determine the range and velocity of targets.

Dušan Krčum, Dušan Grujić, and Milan Savić are with Novel*IC* Microsystems, Omladinskih brigada 86p, Belgrade 11000, Serbia, E-mail: {dusan.krcum, dusan.grujic, milan.savic}@novelic.com

Lazar Saranovac is with School of Electrical Engineering, University of Belgrade, Belgrade 11000, Serbia. E-mail: laza@el.etf.rs



Fig. 1. Block diagram of standard FMCW radar front end.

VCO frequency is swept linearly in time by amount of frequency change:

$$f_{VCO}\left(t\right) = f_0 + \frac{\Delta f}{T_{ch}}t,\qquad(1)$$

where  $f_0$  is carrier frequency,  $\Delta f$  is frequency deviation,

and  $T_{ch}$  is duration of frequency change. In reality, VCO frequency change is realized using triangular modulation signal, as shown on Fig. 2.



This signal is amplified using power amplifier (PA) and transmitted. Received reflected signal from target is attenuated and time delayed transmitted signal. Received signal is amplified by LNA, and mixed with transmitted

signal. Frequency difference between instantaneous VCO and received frequencies, called beat frequency -  $f_b$ , is proportional to time of flight ( $t_{flight}$ ):

$$f_b = \frac{\Delta f}{T_{ch}} t_{flight} \,. \tag{2}$$

Target range is *R* and it's given with equation:

$$R = \frac{T_{ch}c}{2\Delta f} f_b \,. \tag{3}$$

# III. CppSim Simulator

CppSim is a free, general behavioral simulator developed at MIT for behavioral simulation of PLLs [9]. Over time the scope of the tool has been expanded, and has been used for behavioral modeling of other circuits as well. System blocks are described using C++, and there are many built-in models for typical blocks in communication systems. Tool allows generation of models for new blocks, modification of existing and very detailed modeling of complex circuits. Digital circuits can be described using Verilog and simulated directly in CppSim. Synthesizable Verilog code is translated to C++ code using external tool called Verilator [9], then simulations are running as standard C++ program. Tool is running time-domain simulation which is important for transient noise analysis. Using C++, simulations are fast and modeling pretty flexible. Simulator has also integrated schematic environment that facilitates system overview.

# IV. 60 GHZ FMCW RADAR MODELING USING CPPSIM

CppSim does not offer suitable models for accurate simulation of systems such as FMCW radar. In order to provide compatible models, FMCW radar blocks are modeled in C++:

- Low Noise Amplifier (LNA)
- Power Amplifier (PA)
- Voltage Controlled Oscillator VCO
- Band pass and Low pass Filters (BPF/LPF)
- Radar propagation channel
- Stationary and moving targets

Detailed modeling procedure for these blocks is presented in the following.

## A. Nonlinear LNA with noise.

Nonlinear LNA model is described with the following parameters:

- Gain [dB] linear gain
- P1dBin [dBm] input referred 1dB compression Power
- NF [dB] noise figure

Most common model used model for LNA nonlinearity is described by third order polynomial expression, [10]:

$$y(t) = a_1 t - a_2 t^3, \tag{4}$$

Coefficient at is actually the linear gain, and it can be calculated as:

$$a_1 = 10^{\frac{0.000}{20}}$$
, (5)

Coefficient a<sub>3</sub> gives information about 1dB compression point and third order inter modulation products. This coefficient is determined from following equations:

$$IIP_{3} = P_{1dB_{in}} + 9.6[dB], (6)$$

$$V^{2}_{IP_{3}} = 2R_{REF} 10^{\frac{m_{3}}{10}} 10^{-3} [V^{2}].$$
 (7)

Combining equations 6 and 7 we get final expression for coefficient  $a_3$ :

$$a_3 = \frac{4a_1}{3V_{IP_3}^2}.$$
 (8)

Noise characterization of LNA was implemented as following. Using input and output signal to noise ratio (SNR) noise factor can be determined as:

$$F = \frac{SNR_{IN}}{SNR_{OUT}} \,. \tag{9}$$

Input noise comes from source resistance thermal noise, which has spectral density of:

$$\overline{v_n^2} = 4kTR_{noise} \,. \tag{10}$$

Thermal spectral noise power density is given with:

$$P_n = \frac{v_n^2}{R} = 4kT , \qquad (11)$$

while available noise power at matched load is:

$$P_{n,match} = \frac{\left(\frac{v_n}{2}\right)^2}{R} = kT,$$
 (12)

On the other hand, noise factor is defined as:

$$F = 1 + \frac{R_{noise}}{R_{rof}},$$
(13)

where Rref is referent resistivity, 50 in our case.

From equation 13 resistor which generates equivalent amount of noise as LNA is:

$$R_{noise} = R_{ref} \left( F - 1 \right). \tag{14}$$

CppSim has built in function called *randg.inp()*, which generates random noise. This function needs input noise spectral density as input parameter.

Using these equations, nonlinear model of LNA has been developed. Beside single ended LNA, differential one was also introduced.

#### B. Power amplifier

Nonlinear model of PA is same as model of LNA, with only difference in compression point definition. PA compression point is specified at output. Parameters that describe PA are:

- *Gain* [dB] linear gain
- *P<sub>1dBout</sub>* [dBm] output referred 1dB compression power
- NF [dB] noise figure

#### C. Voltage Controlled Oscillator

VCO is an essential building block for radar systems, and it's parameters have significant impact on overall system performance. Simplified model of phase noise profile was used in first iteration. Modeled phase noise

profile has two regions -  $\frac{1}{f}$  and constant spectral density.

Noise samples are calculated by the provided model for  $\frac{1}{f}$ 

noise and added to the VCO control voltage. More accurate model of VCO phase noise should be developed according to [11].

In real radar system, IQ signals can be generated using quadrature VCO or using VCO with quadrature generator. In both ways amplitude and phase mismatches are present. These imbalances degrade system performance. Phase and amplitude imbalances are incorporated in existing models, so it is possible to evaluate their impact on performance. Our model assumes quadrature VCO, with the following parameters:

- fo [Hz] Center frequency
- $kv \left[\frac{Hz}{V}\right] \text{VCO characteristics}$
- $f_{corner}$  [Hz] Corner frequency between  $\frac{1}{f}$  and constant spectral density
- noise at offset  $\left[\frac{dBc}{Hz}\right]$  Noise spectral density at

foffset from carier

- *f<sub>offset</sub>* [Hz] Offset frequency at which the noise floor is specified
- *phase imb* [°] phase imbalance
- *amp imb* [dB] amplitude imbalance

Beside these parameters, proposed VCO model has also logical parameter noise enable. Value 0 switch off noise generator in VCO block. Default state is with noisy VCO.

In reality, VCO frequency is not linear function of control voltage. This nonlinearity leads to time-varying beat frequency, which can be interpreted as a moving target. This scenario can be modeled with higher order polynomial characteristics of VCO, instead of linear one, described with only *kv* parameter. Nonlinear characteristics of VCO is modeled with 14<sup>th</sup> order polynomial, and simulation of proposed model will be presented in section V.

#### D. Filters

Filters are implemented using 7<sup>th</sup> order Butterworth approximation. Several filters were used: Bandpass with cutoff frequencies of 54 GHz and 64 GHz, and low pass for beat frequencies with cutoff frequency of 200 kHz. CppSim has built-in Butherworth approximation class for filters, but filter order is fixed and equal to 1. This class was modified, and model parameter for bandpass filters are:

- Gain [dB] linear gain
- BW
- E. Radar Propagation Channel

Propagation of electromagnetic waves through radar channel is described by time delay due to finite velocity of propagation, and propagation loss. Time delay can easily be derived from target distance, and propagation loss is given with radar equation:

$$P_r = \frac{G_t G_r \left(\frac{c}{f}\right)^2 \sigma}{\left(4\pi\right)^3 R^4} Pt^{-1}$$
(15)

Using same model, Tx to Rx leakage was modeled. This leakage is important because short path from Tx and Rx due to vicinity of antennas is presented in frequency domain as target at distance equal to distance between the antennas.

#### F. Stationary and Moving Targets

Stationary and moving targets implement free path loss, target radar cross section, while the moving target implements Doppler shift. They are characterized by the following parameters:

- *range* [m] Distance to target
- *rcs* [m<sup>2</sup>] Radar Cross Section
- *velocity* [m/s] Target velocity for moving targets Static targets can be simulated as particular static block,

or as moving target block with velocity parameter equal to 0.

## V. Results

Test system has been made by using the designed behavioral models, as shown on Fig.3. Chirp signal is generated using triangle modulation pattern. System block specifications are given in Table I. First test was performed



Fig. 3. FMCW Radar in CppSim.

#### TABLE I.

MODEL PARAMETERS FOR DESIGNED FMCW RADAR

| Model Parameter                         | Label                  | Value         |
|-----------------------------------------|------------------------|---------------|
| Chirp frequency                         | $f_{ch}$               | 500 Hz        |
| Carier frequency                        | $f_0$                  | 60 GHz        |
| Frequency deviation                     | $\Delta f$             | 2 GHz         |
| Phase noise offset                      | $f_{\it offset}$       | 1 kHz         |
| Noise floor relative to carier at 1 MHz | /                      | -90<br>dBc/Hz |
| Amplitude of generated signal           | A <sub>in</sub>        | 117 mV        |
| PA gain                                 | G <sub>PA</sub>        | 15 dB         |
| PA 1dB Compression                      | P <sub>1dBPA</sub>     | 1 dBm         |
| PA output saturation power              | P <sub>satoutPA</sub>  | 20 dBm        |
| LNA gain                                | G <sub>LNA</sub>       | 10 dB         |
| LNA noise factor                        | NF <sub>LNA</sub>      | 6 dB          |
| LNA 1dB Compression                     | P <sub>1dBLNA</sub>    | 1 dBm         |
| LNA output saturation power             | P <sub>satoutLNA</sub> | 20 dBm        |

with three static targets at distances 1 m, 2 m, and 5 m. All noise generators and other nonidealities are included in simulation. Theoretically, expected peaks in spectrum of I or Q signals are located at 13.33 kHz, 26.67 kHz and 66.67 kHz for above targets. Peaks at exact frequencies are visible on Fig.4

Moving targets are also considered. Due to Doppler frequency shift, expected peaks in output spectrum are at frequencies equal to  $f_b - f_{Doppler}$  and  $f_b + f_{Doppler}$ . Doppler shift is proportional to instantaneous velocity of target. As an example, single moving target was checked. On Fig.5 output spectrum is presented, where two peaks are visible. Chosen velocity was v = 5m/s which correspond to Doppler shift of  $f_{Doppler} = \frac{2vf}{c} 2.4kHz$ .

In order to demonstrate nonlinear characteristic of VCO and Tx to RX leakage, test bench with single static target was chosen. On Fig.6 time domain output signal is presented. In ideal scenario, signal would be pure sinusoidal, with frequency fb. On Fig.6 low frequency sine wave from leakage is visible, on which the beat frequency sine wave is riding. Nonlinear VCO is recognized by time varying beat frequency.



Fig. 4. Spectrum of I branch for 3 static targets.



Fig. 5. Spectrum of I branch for one moving target.



Fig. 6. Time domain output signal.

# VI. CONCLUSION

After brief overview of FMCW radar operation and short description of CppSim simulator, models of FMCW radar blocks have been developed. Using this models, various radar systems with real scenarios have been simulated. Some simulation results are given in paper. Developed radar models can facilitate design and improve understanding of nonidealities impact on millimeter-wave FMCW radar performance.

#### REFERENCES

- [1] http://www.novelic.com/index.php/product
- [2] J. Lee, Y. A. Li, M. H. Hung, S. J. Huang, "A Fully-Integrated 77GHz FMCW Radar Transceiver in 65nm

*CMOS Technology*", IEEE Journal of Solid State Circuits, Vol. 45, No. 12, December 2010.

- [3] Y. Sun, "Design of an Integrated 60 GHz Transceiver Front-End in SiGe:C BiCMOS Technology", PhD thesis, Brandenburgischen Technischen Universität, 2009.
- [4] T. Mitomo, N. Ono, H. Hoshino, Y. Yoshihara, O. Watanabe, I. Seto, "A 77 GHz 90 nm CMOS Transceiver for FMCW Radar Applications", IEEE Journal of Solid State Circuits, Vol. 45, No. 4, April 2010.
- [5] M. I. Skolnik, "Introduction to Radar Systems", 3rd edition, McGraw-Hill, Singapore, 1981.
- [6] D. K. Barton, "Modern Radar System Analysis", Artech House Radar Library, Norwood, 1988.
- [7] I. V. Komarov, S. M. Smolskiy, "Fundamentals of Short-range FM Radar", Artech House Radar Library, Norwood, 2003.
- [8] B. R. Mahafza, "Radar Systems Analysis and Design Using MATLAB", 3rd Edition, CRC Press, 2013.
- [9] M. H. Perrott, CppSim, available at www.cppsim.com/
- [10] K. Kundert, "Introduction to RF Simulation and its Application", Version 2, Designer's Guide Consulting, Inc., April 2003., available at <u>http://www.designers-guide.org/</u>
- [11] A. Hajimiri, T. H. Lee, "A General Theory of Phase Noise in Electrical Oscillators", IEEE Journal of Solid-State Circuits, Vol. 33, No. 2, February 1998.

# High PSRR Gain-Boosted Rail-To-Rail OTA

Vazgen Melikyan Sh., Arthur Sahakyan S., Svetlana Poghosyan M., Armen Sahakyan S., Vahe Babayan S.

Abstract - High-gain and high frequency band width operational transconductance amplifier (OTA) with high PSRR is presented in this paper which can be used in switched capacitor filters and/or pipeline A/D converters. It is demonstrated the best trade-off between DC gain, speed, and PSRR for this design. The OTA achieves a constant large signal DC gain of > 90 dB and PSRR of > -23 dB over process and temperature variations. It is designed in a 28nm CMOS process and draws a DC power of 7 mW from a 1.8-V supply. The settling time to < 0.05% accuracy for the worst case is ~ 8.3 ns. The presented correction technique can be used in the high speed ADCs and in special input/output circuits of several standards such as Peripheral Component Interconnect (PCI), Universal Serial Bus (USB) and etc.

*Keywords* - Power Supply Rejection Ratio (PSRR), on transconductance amplifier (OTA), rail-to-rail (R2R), gain-boosted, bandwidth (BW)

# I. INTRODUCTION

Typical gain-boosting structures are shown in Fig. 1 Designing analog functional blocks with high gain, large bandwidth and high PSRR under limited supply voltage (0.8-1.8V) becomes more and more difficult, because when mosFETs were cascoding for PSRR improvement as a result the output signal swing becomes more and more limited. Power noise can significantly decrease performance by reducing the dynamic range of the whole system, especially in high precision systems or if the circuits that are sensitive to supply noise are at the very beginning of the power supply/reference chain. Cascoding technique is the mostly used method to achieve high gain and to reject channel length modulation effect compared to 2-stage OpAmp designs because of its superior frequency response.

In this paper presented a CMOS single supply operational transconductance amplifier (OTA) with high PSRR. A high output impedance current source and noise reduction techniques are used to improve the PSRR both at DC and at higher frequency up to the gain bandwidth (GBW) of the OTA.

Main problems while trying to cascode more transistors is a limited supply voltage. Gain-boosting technique (Fig. 1) [1] was introduced to remedy this problem. It allows increasing the DC gain of the operational amplifier without sacrificing the output swing of a regular cascade structure.

Vazgen Melikyan, Arthur Sahakyan, Svetlana Poghosyan, Armen Sahakyan and Vahe Babayan are with the Department of Microelectronics Circuits and Systems, State Engineering University of Armenia, Teryan 105, 0009, Armenia Yerevan,

E-mail: <u>vazgenm@synopsys.com</u>, <u>arthurs@synopsys.com</u>, <u>sveta@synopsys.com</u>, <u>sahakyanarmen5@gmail.com</u>, <u>vahebab@gmail.com</u>,



Fig.1. Gain-Boosting Technique

This can provide high speed and high gain at the same time. Gain-boosted OTA (GB OTA) [2] can have the high gain and can hold in working conditions at high speed systems such as switched capacitor filters (Fig. 2) and ADCs. It will cause to high quality performance and excluded input signals inequality. As a result of the mentioned phenomena, the system may fail to function under some operating conditions such as high temperatures or over-voltages.



Fig.2. Bode plot for high band width, high gain and Gain-Boosting Technique

Generaly PSRR is defined as the gain from the input to the output divided by the gain from the supply to the output (Eq. (1)). At low frequencies:

$$PSRR \sim g_{mIN}(r_{0CSload} || r_{0IN})$$
 (1)

Where  $g_{mIN}$  is amplifier input pair conductivity,  $r_{0CSload}$  and  $r_{0IN}$  are current source load and input pair channel length modulation resistance.

It needs to be pointed out that it is well known that a polezero doublets is often associated with gain boosting. Later in simulations results part will be present the result from a smallsignal analysis of the gain-boosting technique. It will become clear that the pole-zero doublets and its consequence (slow settling) (Fig.3.) can be very well suppressed.



Fig.3. Settling time calculation

# II. HIGH PSRR GAIN-BOOSTED OTA CIRCUIT ARCHITECTURE

The structure of proposed High PSRR GB OTA with Railto-Rail inputs is presented in Fig. 4.



Fig.4. Circuit structure GB High PSRR OTA with Rail-to-Rail inputs

For increasing PSRR in this architecture proposed Cpsrr capacitors from supply to current mirror's bias points which can controlled by ENpsrr signal (Fig. 4.) and for input signal range proposed to use rail-to-rail method. This structure provided high small-signal gain and high PSRR with no frequency band width degradation. Therefore, optimizing the unity-gain bandwidth of the gain booster becomes difficult. A technique used in this paper is to place a small capacitor (usually a MOS cap.) from the VDD to current mirror bias points to reject noises coming from supply.

For minimizing feed forward zero effect capacitive f/b networks provides a feed forward path for input signal to bypass the OTA. Since the f/b is negative, an instant input jump introduces a big spike at the output which is, unfortunately, to the opposite side of the final output voltage. A straightforward calculation yields the following expression for this spike:

$$\Delta V_o = V_o \cdot \beta \cdot \frac{C_f}{C_f + C_L} \qquad (2)$$

Since the f/b factor is largely fixed by the closed-loop gain, the only parameter we have control on to minimize the initial spike is the ratio  $C_{\rm I}/C_{\rm f}$ .

In the ideal OTA, the output current is a linear function of the differential input voltage, calculated as follows:

$$I_{\text{out}} = (V_{\text{in}+} - V_{\text{in}-}) \cdot g_{\text{m}}$$
(3)

where Vin+ is the voltage at the non-inverting input, Vin– is the voltage at the inverting input and  $g_m$  is the transconductance of the amplifier.

When ENpsrr signal equal to logic "0" pmos switches turning on and Cpsrr connecting corresponding current mirrors bias points to VDD and rejecting noises coming from supply.

# **III. OPERATION PRINCIPLE AND AMPLIFYING**

Block diagram on Fig 5. has been proposed to amplifying differential signals and having high frequency bandwidth and high PSRR. As it is known the DC gain value of amplifier is reverse proportional to bandwidth. Thus it is imperative to have high DC gain and high PSRR, in order to avoid unequal distribution when the system is in the negative feedback condition. As input of OTA (Fig 5.) to rail-to-rail input stage coming differential analog signals. Then this signals providing branch current of differential amplifier and gain boosted system providing low transconductance variation. The high PSRR system which canceling power noise and OTA biasing point's voltage dependency and when ENpsrr enabling signal equal to logic "0" the system providing high PSRR output signal.



Fig.5. Block diagram of proposed method with negative feedback

As it was mentioned above the high PSRR mode enabled with ENpsrr input active signal and providing in the output high supply noise rejection over PVT.

# **IV. SIMULATION RESULTS**

Simulations have been performed using circuit level simulator HSpice[4] for 20 PVT corners, including SS (slow-slow), TT (typical-typical), FF (fast-fast), SF (slow-fast), FS (fast-slow) with supply voltage and temperature variations to estimate PSRR, small-signal gain and the unity-gain BW.

Fig. 7(a) shows OTA alternating current analyses results for TT  $(55^{\circ})$  typical corner. It is seen that amplifier's max PSRR is near to -39 dB and worst PSRR is -34 dB. Fig. 7(b) and Fig. 7(c) show simulation results for, respectively, FF (-40°) and SS  $(125^{\circ})$  main PVT corners. Taking into consideration that USB3 protocol works with the 5Gb/s data rate signal, which means that Data have 400ps pulse period and 200ps pulse width, we have put internal specification for UGBW, PSRR and etc. After enabling PSRR mode it's changed from -15dB to -38.7dB at TT corner. In USB3 specification book phase margin min value defined as  $45^{\circ}$ .

The next important parameter is Settling time (ST), which shows the time when OTA outputs is in settling zone. Table 1 shows results for 3 main corners.

|                                                 | Bound Corners |              |              |  |
|-------------------------------------------------|---------------|--------------|--------------|--|
| Main Parameters                                 | ТТ,<br>25°С   | SS,<br>125°C | FF,<br>-40°C |  |
| Diff. Output Voltage<br>Swing<br>(peak-to-peak) | 1.8 V         |              |              |  |
| RMS Output Noise<br>(1 Hz to 100 GHz)           | 34.2µV        | 45.4 μV      | 32.7 μV      |  |
| mall-Signal $A_V(gain)$                         | 98.4 dB       | 96.5 dB      | 101.2 dB     |  |
| Phase Margin (PM)                               | 82.0°         | 85.2 °       | 83.8°        |  |
| Unity-Gain BW                                   | 92.0 MHz      | 85.3 MHz     | 102.7 MHz    |  |
| Settling Time                                   | 7.1 ns        | 5.3 ns       | 3.4 ns       |  |
| Total Power<br>Consumption                      | 3.05 mW       | 3.54 mW      | 3.13 mW      |  |
| PSRR                                            | -38.7 dB      | -37.4 dB     | -39.3 dB     |  |
| Worst PSRR                                      | -33.8 dB      | -29.7 dB     | -31.6 dB     |  |

TABLE I. SIMULATION RESULTS OF THE THREE MAIN CORNERS











Fig.7. OTA PSRR alternating current analyses results for TT (a), FF (b) and SS (c) corners

# VI. CONCLUSIONS

A circuit designed for high speed systems integrated in negative feedback structures. Amplifiers like proposed provided high gain, high working frequency and high PSRR from supply and ground parallelly and can operate in some settling systems as stable amplifier.

Worst PSRR value for TT corner is equal to -33.8dB and max value of PSRR is -38.7dB which without PSRR mode enabling is -15dB; the BW has the value of 92MHz and PM equal to  $82^{\circ}$  whereas the spec from the USB3 specification book is  $45^{\circ}$ .

The approached method can be implemented for input/output protocols such as USB, PCI and etc.

## References

- K. Bult and G. Geelen. "A fast-settling CMOS op amp with 90-dB DC gain and 116 MHz unity-gain frequency" ISSCC Dig. Tech. Papers, Feb. 1990. pp. 108-109.
- [2] Ayman Shabra and Hae-Seung Lee, Fellow, IEEE "Oversampled Pipeline A/D Converters With Mismatch Shaping", IEEE journal of solidstate circuits, vol. 37, no. 5, may 2002
- [3] Yun Chiu, Ken Wojciechowski. "A Gain-Boosted 90dB Dynamic Range Fast Settling OTA with 7.8-mW Power Consumption", University of California, Berkeley
- [4] HSPICE Application Manual, Synopsys Inc. 2010. 196p.
- [5] K. Bult and G. Geelen. "The CMOS gain-boosting technique," Analog Integrated Circuits and Signal Processing, vol.1, (no.2), Oct. 1991. pp. 119-35.
- [6] Sobhy, E.A.; Hoyos, S.; Sanchez-Sinencio, E. *"High-PSRR low-power single supply OTA"*, Electronics Letters (Volume:46, Issue: 5) 08 March 2010; pp 337 - 338

# The Correlated Level Shifting as a Gain Enhancement Technique for Comparator Based Integrators

Vazgen Melikyan, Hayk Dingchyan, Artak Hayrapetyan, Arthur Sahakyan, Vardan Grigoryants, Vahe Babayan and Ashot Martirosyan

*Abstract* - This paper shows that correlated level shifting (CLS) technique can be successfully used in the comparator based integrators with the same benefits as in the conventional integrators. With help of CLS technique the signal dependent variation reduces by making current source work in the more voltage headroom region thus an output impedance of the current source is increasing. The latter leads to high gain of the system and more linear voltage ramp by the current source. This technique is especially used in the deep sub-micron technologies with the supply voltage below 1V, where headroom of a current source is low. Proposed comparator-based integrator is using comparator with about 27dB of gain which translates into 54dB of effective gain. Integrator was used to design the MASH sigma delta modulator which achieved SQNR of 86.0 dB at an OSR of 64.

*Keywords* - comparator-based integrators, sigma-delta modulator, correlated level shifting

# I. INTRODUCTION

Switched-capacitor circuit techniques are widely used in many applications including filtering and analog-todigital conversion. Performance of the last relies on achieving good capacitor matching and high opamp DC gain, both limiting the achievable accuracy of such circuits. Due to low gain of op-amps inverting inputs of its do not getting close each other (e.g. condition of the virtual zero is not satisfied) causing errors in the charge transfer phase. This error can be crucial depending on its applications. For example in the filters using low gain op-amps causes errors in both amplitude and phase response. In application like ADCs and DACs finite op-amp gain limits resolution and requirements in such converters is an order of magnitude higher than in the filters.

Vazgen Melikyan, Hayk Dingchyan, Artak Hayrapetyan, Arthur Sahakyan, Vardan Grigoryants, Synopsys Armenia CJSC

E-mail: <u>vazgenm@synopsys.com</u>, <u>haykd@synopsys.com</u> artakh@synopsys.com, <u>arthurs@synopsys.com</u>, vardang@synopsys.com respectively.

Vahe Babayan - State Engineering University of Armenia, E-mail: vahebab@gmail.com

Ashot Martirosyan - Russian - Armenian (Slavonic) University E-mail: ashot23@gmail.com Producing well-matched capacitor is easy in the advanced CMOS technologies. However, because of the decreasing supply voltage, designing high-gain and highbandwidth operational amplifiers with good output swing is becoming more difficult, as cascoding of devices limits output signal swing. Many techniques have been proposed to overcome the low gain opamp issue. One of those techniques is to use comparator-based switched capacitor circuits. The comparator-based switched capacitor circuits (CBSC) suffer from signal dependent variation of the current source and overshoot due to the comparator delays as shown in [1].

# II. CONVENTIONAL INTEGRATORS WITH USE OF CLS TECHNIQUE

Let's see how the correlated level shifting technique is working on the conventional integrators. In the CLS technique, the sampling phase (Fig. 1(a)),  $\varphi_1$ , is the same as for the switched-capacitor integrator of the conventional one (without CLS). The difference between the CLS and non-CLS circuit occurs in the charge-transfer phase,  $\varphi_2$ , which is subdivided into coarse and fine charge-transfers for the CLS case. During the coarse charge-transfer (Fig. 1(b)), the opamp (a.k.a Operational Amplifier) settles as for the non-CLS switched-capacitor integrator. However, at the end of this settling, the output voltage,  $V_{out}$ , is sampled onto capacitor  $C_{CLS}$  as a coarse estimate of the final value of Vout. During the fine charge-transfer (Fig. 1(c)), the voltage sampled onto  $C_{CLS}$  is used as an offset between the output of the opamp and node Vout. Due to this correlated level shifting, the voltage at the output of the opamp depends only on the difference between the coarse estimation of  $V_{out}$  and the final value of  $V_{out}$ . The transfer function of the integrator in the Z domain given an op-amp DC gain of A, can be written as [7]

$$\frac{V_{out}}{V_{in}} = \left(\frac{C_{in}}{C_{fb}}\right) \left[\frac{(1-\delta)z^{-1}}{1-(1-\alpha)z^{-1}}\right]$$
(1)

where

$$\delta = \frac{1}{1 + A\beta(1 + A)};$$
  
$$\alpha = \frac{1 - \beta}{1 + A\beta(1 + A)} \text{ and } \beta = \frac{C_{fb}}{C_{in} + C_{fb}}$$

This decreased dependence on  $V_{out}$  reduces the effects of limited opamp gain, and potentially increases the swing of  $V_{out}$  beyond the range possible if  $V_{out}$  was directly driven by the opamp. The swing at the output of the op-amp will depend on the ratio of the level shifting capacitance to the total capacitance at the output. The larger the level shifting capacitor, the lower the swing but the higher the op-amp power consumption. Thus, there is a trade-off between the op-amp swing requirement and power consumption when choosing  $C_{CLS}$ .

# III. CORRELATED LEVEL-SHIFTING IN CBSC CIRCUITS

We are proposing to use correlated level-shifting technique in comparator based switched capacitor circuits as well, expecting to have the same improvements in the output resistance of the current source and the effective gain of integrator.

In this technique,  $I_{fine}$  is not directly connected to node  $V_{out}$  but instead is capacitively coupled via capacitor  $C_{CLS}$ . During the coarse charge-transfer phase, an extra switch is closed, setting the voltage across  $C_{CLS}$  equal to the coarse estimation of  $V_{out}$ . This switch is then opened for the fine charge-transfer phase to be completed.

During the fine charge-transfer, when  $I_{fine}$  is used to set  $V_{out}$ , the voltage seen by  $I_{fine}$  is level-shifted through  $C_{CLS}$  by a voltage equal to the coarse estimation of  $V_{out}$ . Thus, the voltage across current source  $I_{fine}$  at the start of the fine charge-transfer is  $V_{DD} - V_{SS}$ . The change in voltage across  $I_{fine}$  during the fine charge-transfer phase is only a function of the coarse overshoot, removing the dependence of current  $I_{fine}$  on the full value of  $V_{out}$ . Because the coarse overshoot is relatively constant,  $I_{fine}$  becomes much more constant. With this correlated level-shifting, the current  $I_{fine}$  at the end of the fine charge-transfer becomes.

$$\begin{split} I_{fine} &= I_{fine0} - \left(\frac{C_{FB}C_{CLS} + C_{IN}C_{CLS} + C_{FB}C_{IN}}{C_{FB}C_{CLS} + C_{IN}C_{CLS}}\right) \times \\ &\times \left(\frac{V_{overshoot,coarse}}{r_o}\right) \\ &= I_{fine0} - \left(\frac{C_{FB}C_{CLS} + C_{IN}C_{CLS} + C_{FB}C_{IN}}{C_{FB}C_{CLS} + C_{IN}C_{CLS}}\right) \times \end{split}$$

$$\times \left(\frac{\frac{dV_{out}}{dt_{coarse}} t_{delay,coarse}}{r_o}\right) \tag{2}$$

where  $\frac{dV_{out}}{dt_{coarse}}$  is the ramp rate of  $V_{out}$  and  $t_{delay,coarse}$  is the delay of the comparator during the coarse chargetransfer. If  $t_{delay,coarse}$  is a constant, then the only dependence of  $I_{fine}$  on  $V_{out}$  now comes from  $\frac{dV_{out}}{dt_{coarse}}$  which is given by

$$\frac{dV_{out}}{dt_{coarse}} = \frac{I_{coarse}}{C_{CLS} + \frac{C_{FB}C_{IN}}{C_{FB} + C_{IN}}} = \frac{I_{coarse} + \frac{v_{out}}{r_{o,coarse}}}{C_{CLS} + \frac{C_{FB}C_{IN}}{C_{FB} + C_{IN}}} = \frac{(I_{coarse} + \frac{v_{out}}{r_{o,coarse}})(C_{FB} + C_{IN})}{C_{FB}C_{CLS} + C_{IN}C_{CLS} + C_{FB}C_{IN}}$$
(3)

17

where  $r_{o,coarse}$  is the output impedance of current source  $I_{coarse}$ . Substituting Eq. (2) into Eq. (1) gives

$$I_{fine}^{I_{fine}} = I_{fine0} - \left(\frac{C_{FB}C_{CLS} + C_{IN}C_{CLS} + C_{FB}C_{IN}}{C_{FB}C_{CLS} + C_{IN}C_{CLS}}\right) \\ \times \left[\frac{\left(I_{coarse} + \frac{V_{out}}{r_{o,coarse}}\right)\left(C_{FB} + C_{IN}t_{delay,coarse}\right)}{\left(C_{FB}C_{CLS} + C_{IN}C_{CLS} + C_{FB}C_{IN}\right)r_{o}}\right]$$
(4)  
$$= I_{fine0} - I_{coarse0}\left(\frac{t_{delay,coarse}}{r_{o}C_{CLS}}\right) \\ - \left(\frac{V_{out}}{r_{o}}\right)\left(\frac{t_{delay,coarse}}{r_{o,coarse}C_{CLS}}\right),$$

Assuming that  $t_{delay,coarse}$  is a constant, the  $I_{coarse0}$  term is a constant. The  $\frac{V_{out}}{r_o}$  term represents the new output dependence of  $I_{fine}$  on  $V_{out}$ . We see from Eq. (3) that the output resistance of  $I_{fine}$ , has effectively been multiplied:

$$r_{o,CLS} = \left(\frac{r_{o,coarse}C_{CLS}}{t_{delay,coarse}}\right)$$
(5)

To further improve output impedance of the gated current source  $I_{fine}$ , one can use also cascoding technique but this would require more voltage headroom and thus more voltage on power supple rails. As year by year, due to the scaling of CMOS technologies, the power supply voltage is reduced, the latter case of using cascoding structure to have more output resistance of current source has not investigated.

Note that for large values of  $C_{CLS}$ ,  $I_{fine}$  is able to pull  $V_{out}$  beyond  $V_{DD}$ . So this will reduce life-tie and reliability of transistors. So in the design stage it is needed verify that the voltage will not exceed beyond the power supply rails.



(a) Sampling phase







Fig. 1. Correlated level shifting technique. Sampling phase,  $\varphi_1 =$  "1"; charge-transfer phase (preset + coarse),  $\varphi_2 =$  "2".

# **IV. SIMULATION RESULTS**

The transfer functions of the conventional integrator of Fig. 1 (but without CLS capacitor), and the proposed CLS integrator of Fig. 2 was simulated with HSPICE using SNAC (Shooting Newton AC) analysis. The DC gain of the opamp was chosen to be 27 dB and the capacitors were chosen such that  $C_{FB} = C_{LS} = C_{CLS}$ , where  $C_S$  is 150fF. As can be seen from Fig. 3, the conventional integrator has a low frequency gain of 27 dB, while CLS integrator obtain a low-frequency gain of 54 dB (double that of the conventional). The integrator was also used to design 2-0 MASH (the multi-stage noise shaping) modulators. The opamp DC gain was maintained at 27 dB while the two integrators in the loop were designed to have a closed-loop gain of 0.5 V/V each.



Fig. 2. Correlated level-shifting used in switched capacitor integrator: Sampling phase,  $\phi_1 = "1"$ ; charge-transfer phase (preset + coarse + fine),  $\phi_2 = "2"$ .

Simulation results from MATLAB in Fig. 4 show that the modulator with the conventional integrator suffers the worst quantization noise leakage. It achieves an SQNR of 65.0 dB at an OSR of 64 while the modulators with CLS integrator achieve an SQNR 86.0 dB. The difference in SQNR between the conventional and the proposed technique is because the effective low-frequency gain is 54 dB in the proposed integrator and a twice lower gain in the conventional integrator.



Fig. 3. Simulated frequency response of integrators

# V. CONCLUSION

A gain enhancement technique has been proposed for comparator-based integrators. Theoretical analysis of the loop gain enhancement has been proved for comparator based integrator with correlated level shifting technique applied, which is successfully observed and approved with HSPICE/MATLAB simulations. Also proposed comparator based integrator with correlated level shifting enhancement was used to build 2-0 MASH sigma-delta modulator which achieves a SQNR of 86.0 dB at an OSR of 64.



Fig. 4. Simulated output spectra of modulators

# REFERENCES

- Brooks, L., Lee, H. "A 12b, 50 MSls, Fully Differential Zero-Crossing Based Pipelined ADC", IEEE J. Solid-State Circuits, Dec. 2009, vol.44, no.12, pp. 3329-3343
- [2] Gregoire, B.R., Moon, U. "An Over-60 dB True Rail-to-Rail Performance Using Correlated Level Shifting and an Opamp With Only 30 dB Loop Gain", IEEE J. Solid-State Circuits .- Dec. 2008. - vol. 43, no.12, pp. 2620-2630
- [3] Nagaraj, K., Vlach, J., Viswanathan, T.R., and Singhal, K., "Switched-capacitor integrator with reduced sensitivity to amplifier gain", Electron. Lett, 1986, 22, (21), p 16.
- [4] R. Goldman, K. Bartleson, T. Wood et al, "Synopsys Open Educational Design Kit: Capabilities, Deployment and Future" Proceedings of the International Conference on Microelectronic Systems Education, San Francisco, USA, July 2009.- P. 45-48.p. 1103– 1105
- [5] HSPICE Simulation and Analysis User Guide, 2010.
- [6] Nagaraj, K., Viswanathan, T.R., Singhal, K., and Vlach, J. "Switch-capacitor circuits with reduced sensitivity to amplifier gain", IEEE Trans. Circuits Syst., 1987, CAS-34, (5), pp. 571–574
- [7] Mingliang L., "Demystifying Switched Capacitor Circuits", Newnes, May 11, 2006

- [8] Musah, T., Gregoire, B.R., Naviasky, E., and Moon, U.: "Parallel correlated double sampling technique for pipelined analogue-to-digital converters', Electron. Lett., 2007, 43, (23), pp. 1260–1261
- [9] Y.Kook, Jipeng Li, Bumha Lee, Moon, U.: "Low-Power and High-Speed Pipelined ADC Using Time-Aligned CDS Technique" IEEE 2007 CICC, pp 321-324.
- [10] Naga Sasidhar, Youn-Jae Kook, Seiji Takeuchi, Koichi Hamashita,Kaoru Takasuka, Pavan Kumar Hanumolu, and Un-Ku Moon.: "A Low Power Pipelined ADC Using Capacitor and Opamp Sharing Technique With a Scheme to Cancel the Effect of Signal Dependent Kickback", IEEE J. SOLID-STATE CIRCUITS, VOL. 44,pp 2392-2401

# Wideband Low Noise Amplifier for Long Term Evolution Systems

Jelena Mišić, Vera Marković

*Abstract* – This paper introduces wideband low noise amplifier design for Long Term Evolution systems. The three-stage, cascade wideband low noise amplifier structure will be presented. The low noise amplifier (LNA) is design for uplink channels for LTE systems. The LNA is designed for LTE receiver front-end which operates in 700-1300 MHz frequency range, covering almost entire LTE uplink frequency band. Wide bandwidth performances are presented. The LNA provides gain above 30 dB and its noise figure is 1.2-1.7 dB. An input and output reflection coefficients are lower than -30dB over the whole frequency range

*Keywords* — long term evolution, low noise amplifier, wideband amplifier

#### I. INTRODUCTION

In recent years, mobile communications systems have become the main type of communications in the world. There is a global need to communicate with anyone, at anytime and from anywhere, and only wireless mobile communications systems make that possible. Due to that the demands for new and improved services and commodities become higher.

The mobile communications systems are moving rapidly through a series of generations, starting (in early eighties of the twentieth century) from the first generation, which main characteristic is usage of analogue techniques for transmission. During the early nineties, the rapid increase in use of internet started at the same time as 2G digital systems came widespread used. The 2.5 generation was the first one that enabled the mobile Internet access, followed by 3G with further improvements in broadband data transmission. Both 2.5G and 3G mobile systems process/switch voice and data through two separate domains: circuit-switched (CS) for voice and packetswitched (PS) for data. Although 3G systems support TCP/IP traffic, they are not fully IP oriented. Considering the need for high speed internet services support in mobile communications system a new technology was necessary. This led next stage in mobile networks development, which are fully IP oriented [1].

The Long Term Evolution System (LTE) defined by the *3rd Generation Partnership Project* (3GPP) in Release 8 provides users much faster data speeds than 3G is able to. Many consider that LTE should be labelled as 3.9G, and according to them the first "true 4G" is LTE Advanced, defined in Release 10. LTE and LTE advanced systems have a lot of advantages for both end users and mobile operators. End users could have better performances and higher number of services on their mobile devices, while mobile operators could improve their networks in order to provide wide bandwidth service with lower latency and higher level of mobility.

According to high LTE requirements, the interest in designing appropriate LTE devices has increased. Especially, the receiver part has come in the centre of attention and in recent years, a lot of LTE front-end have been designed.

There are a variety of amplifier topologies used in those receiver front-end blocks. A 2.3 GHz narrowband low noise amplifier is presented in [2]. It is designed for WiMax but it can be implemented in any system which works at mentioned frequency. Proposed LNA has very simple structure, it implements single stage common-gate topology and his features are: 15 dB gain and noise figure 1.1 dB. In [3] a LTE wideband low noise amplifier is presented with 38 dB gain and 4.5 dB noise figure. Both narrowband and wideband amplifiers employ source degeneration as noise cancelation technique. The same noise cancelation technique is used in [4] where a low power CMOS receiver front-end for LTE system is presented. Presented receiver has folded cascade topology and consists of wideband common-source low noise amplifier and mixer. The illustrated front-end operates from 2545-2700 MHz, which covers 7 frequency bands. This front-end achieves 8.89 dB gain and 8.25 dB noise figure.

Nowadays, the most used topology for low noise amplifiers is cascade topology for the reason of high gain and low noise figure [5], [6]. In [5] the cascade low noise amplifier with optimal noise figure, high gain and good input and output matching is designed. In the implemented cascade structure two inductors are used. The input inductor is used for input matching and noise reduction, and the output inductor is used for output matching and acceptable OIP3. In [6] two-stage low noise amplifier is presented. It consists of common-gate stage and commonsource stage. Common-gate stage is employed as the input stage, and the common-source stage is employed as the

Jelena Misic and Vera Markovic are with the Department of Telecommunications, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia and Montenegro, E-mail: jelena.eka@gmail.com, vera.markovic@elfak.ni.ac.rs

output stage. The input stage provides low noise figure and good input matching, while the output stage provides high frequency gain and output power matching. By combining these two stages good wideband (0.4-10 GHz) features are achieved, power gain is around 12 dB, and noise figure is between 4.4-6.5 dB.

The design of a 0.7-2.7 GHz LTE low noise amplifier is given in [7], with 17.3 dB gain and 2 dB noise figure obtained. However, noise figure and gain are directly dependent, so it can be expected that with the increase of gain the noise figure will increase too, and vice versa.

In this paper, a LTE low noise amplifier with improved noise cancelation technique will be presented.

This paper is organized in five sections. In Section 2 the theoretical background of LTE is provided. In Section 3 the LTE receiver fundamentals are presented. In Section 4 LNA design and its performances are illustrated, and Section 5 contains summarized results and future research trends and developments.

# II. LTE ARCHITECTURE

As already noted, the need for high speed internet and new services was the primary demand of new technology for mobile communications systems. However, that was not the only demand of the 4G system technology, there was also a need for more spectrum resources. Therefore, a wider spectrum and better spectrum efficiency have been necessary to achieve the requirements placed in front of the 4G systems.

With more spectrum coming into use there is a need to operate in huge number of different frequency bands, which can be different size and sometimes fragmented in spectrum. So, it was needed high spectrum flexibility with the possibility for a varying channel bandwidth.

The overall aim of providing a new radio-access technology is switching on packet-switched data only. In parallel to the development of new radio-access technology it was necessary to develop new network architecture, including both the radio access network and the core network.

The network architecture of LTE [8] is comprised of following three main components (Fig.1.):

- the user equipment (UE) usually called end user,
- the Evolved Terrestrial Radio Access Network (E-UTRAN), and
- The Evolved Packet Core (EPC).

The E-UTRAN provides the radio communications between the end user and the Evolved Packet Core. The E-UTRAN consists of just one type component, the evolved base stations, called eNodeB or eNB (evolved NodeB). Each eNB is a base station that controls mobile users in one or more cells.



Fig. 1. The architecture of LTE network

The EPC was introduced by 3GPP in Release 8. The main idea was to develop a flat architecture and to separate the user data and the signaling. Thanks to that data split, the operators can dimension and adapt their network easily. The key components of EPC are:

- Mobility Management Entity (*MME*) which manages session states and tracks a user across the network,
- Serving Gateway (S-gateway) which routes data packets through the access network,
- Packet Data Node Gateway (PGW) which acts as the interface between the LTE network and other packet data networks; manages quality of service (QoS), and provides deep packet inspection (DPI) and Policy and Charging Rules Function (PCRF).

The architecture of the EPC is organised in a way that the user equipment (UE) is connected to the EPC over Universal Terrestrial Radio Access Network (E-UTRAN) which is LTE access network. The EPC is connected to the external networks, which can include the IP Multimedia Core Network Subsystem (IMS).

The LTE radio access, E-UTRA is based on multiple access technique called Orthogonal Frequency Division Multiple Access (OFDMA) in the downlink, while the uplink uses Single Carrier Frequency Division Multiple Access (SC - FDMA). The basic idea of the OFDMA is that the total data stream is divided into a number of streams that are transmitted in parallel, using specific orthogonal frequency subcarriers. OFDM technique is extremely resistant to frequency selective fading, which was the biggest problem with wideband channels. Each of the sub-channels in LTE system is 15 kHz width and it is modulated with one of the conventional modulation: QPSK, 16QAM or 64QAM. The OFDM provides some additional benefits: provides access to frequency domain, flexible transmission bandwidth (1.4MHz, 3 MHz, 5 MHz, 10 MHz, 15 MHz and 20 MHz), broadband/multicast transmission, and the possibility for carrier aggregation (introduced in Release 10), etc.

In the carrier aggregation solution, multiple LTE carriers can be transmitted in parallel to/from the same terminal. Thereby, wider bandwidth and correspondingly higher data rates are provided. Up to five components with



#### CASCADE TOPOLOGY

Fig. 2. Cascade topology

bandwidths up to 20 MHz can be aggregated, so overall transmission bandwidth can be up to 100 MHz. Aggregated component carriers do not need to be contiguous in the spectrum. With respect to the frequency location of the aggregated carriers, three different cases can be indentified:

- 1. intra-band, aggregation with frequency contiguous component carriers,
- 2. intra-band, aggregation with frequency noncontiguous component carriers, and
- inter-band, aggregation with frequency noncontiguous carriers.

The possibility to aggregate non-contiguous component carriers allows operators to operate with a fragmented spectrum. Thereby mobile operators can provide high data rate services even though they do not possess a single wideband spectrum allocation.

# III. THE LTE RECEIVER

Accordingly to the previously mentioned LTE performances, there are several basic requirements for LTE receiver [9]:

- high gain and low noise figure at the same time, but this two requirements often exclude each other, and some trade-offs should be managed.
- sufficient level of sensitivity, which represents the lowest level of the received signal which can be detected. Sensitivity level must be lower than expected level of the signal, otherwise the signal will not be detected.
- good selectivity, which is defined as the receiver ability to extract the desired signal in the presence of other signals that interfere with it. Due to increasing number of wireless service spectrum is becoming completely filled. Therefore, it is important that the receiver has the appropriate selectivity in order to fulfil certain frequency signal reception.
- good linearity, dynamic range, etc.

Low noise amplifiers are key components in the receiver of any communications system. Regarding to the very low level of receiving signal low noise amplifier should enhance the level of incident signal without introducing significant noise and distortion. As it well-known, the most important part in total noise factor of a receiver system is a noise factor of its first stage. Due to that low noise amplifier, as a first stage in receiver, must have a very low noise factor.

In the literature, various designs of the LNA can be found [2]-[7]. The main differences between them are LNA topology and active component in LNA.

# IV. LNA design

The goals in LNA design are to maximise its gain and minimise its noise figure with sufficient linearity and impedance matching. It should be highlighted that it is impossible to design low noise amplifier with peak performances for all criteria, because some of them exclude each other, so some trade-offs must be made.

In order to meet the key demands for LTE receiver characteristics, a LNA is designed starting of the following: LNA performances that should be met are the noise figure less than 2 dB, and gain above 20 dB through the whole range which is of interest. Also, good input and output impedance matching should be achieved, the parameters  $S_{11}$  and  $S_{22}$  must be lower than 30 dB.

The features of low noise amplifier are limited by properties of its active device. Due to that, the selection of active device with correspond parameters, is crucial step for reaching the target LNA specifications. Many active components can be suitable for LNA, as for instance the Bipolar Junction Transistor (BJT), the Heterojunction Bipolar Transistor (HBT), the Metal Epitaxial Semiconductor Field Effect Transistor (MESFET), the High Electron Mobility Transistor (HEMT), etc. Although the MESFETs technology became widely spread, the BJT still present the transistor of choice for many amplifiers because of their greater linearity and ease of manufacture.



Fig 3. The low noise amplifier circuit



Fig 4. (a) S parameters and noise figure over the frequency range 0-2 GHz (b) parameters  $S_{11}$  and  $S_{22}$  over the frequency range 0.6-1.4 GHz

Due to the large voltage gain, very low cost and high robustness of the BJTs, it was decided to design the LNA with BJT transistors. Also, the BJT transistor is very easy to bias. The NXP bfg520 BJT transistor [10] is chosen to be used in this LNA design.

The design of the amplifier is performed within a software environment Advanced Design System (ADS) from Agilent Technologies Company.

Generally, there are a large number of different possible LNA topologies involving single-transistor topology, cascode-based topology, and different types of two (or more)-stage topology, called cascade topology. However, the topology which performances meet the best the performances of LNA needed for a particular application, should be chosen as the most suitable.

If amplification of single stage is not enough for a particular application, cascode or cascade topologies can be implemented. The cascode structure consists of certain number of active components. It could be realised as simple cascode or cascode with some variations. The cascade topology with more than one transistor is often used to achieve an overall higher gain. It could be simple cascade topology or cascade topology with some implementation, like a noise-cancelation cascade topology.

Due to cascade topology high gain, it was decided to use this type of amplifier topology in design, Fig. 2. Three amplifiers are cascaded together, whereby the commonemitter cascade arrangement is chosen because it is quite easy for realisation and implementation and high gain level that can be achieved. There are a variety of cascade amplifier coupling. Due to simple circuit arrangement and quite inexpensive characteristic direct coupling is implemented in design. With direct coupling, the output of one transistor is connected directly to the input of the next transistor.

In order to make amplifier more stable, the negative

feedback is used. The feedback is implemented by a thin film resistor (TFR) component. TFR represents a resistor designed using a thin film conductor with a resistance of 50 Ohm/square. The conductor width and length were chosen to achieve the most suitable resistance for our device. TFR component width and length were also included in simulation and optimization process. However, negative feedback reduces gain, but the three amplifier stages provide enough gain and relatively large amounts of feedback may be used without sacrificing gain.

The emitter degeneration technique is used in this design as a noise cancelation technique. This technique implies element insertion between the emitter and ground. Usually, some lumped component is employed in this noise cancelation technique. Due to fact that microstrip lines are very commonly used and widespread, it was decided in this work to realise source degeneration employing microstrip lines. Three microstrip lines of certain dimensions, one per transistor, were placed between transistor emitters and ground.

In order to reduce losses in amplifier circuit appropriate impedance matching is necessary at both ports. The matching networks are realised by microstrip lines with certain dimensions. The microstrip substrate is Rogers TMM 10i Laminates [11]. The most appropriate lengths and widths of microstrip lines were determined in the process of optimization in ADS. During the optimization lengths and widths of lines were changed while the parameters of the substrate were held constant. Also, during the optimization process bias positions of transitors had not be changed, only dimensions of microstrip elements were optimized. The proposed amplifier circuit is presented in Fig.3.

Some simulation results for the designed LNA after optimization, obtained using the same software environment ADS, are presented in Fig.4.

The Fig. 4 shows the results of *S* parameter analysis of the designed amplifier. In Fig. 4 all *S* parameters and noise figure are displayed at 700-1300 MHz frequency band.

The graph in Fig. 4(a) shows the values of the *S* parameters and the noise figure level in the range 0-2 GHz. As it can be noticed form graph in Fig. 4(a), the  $S_{11}$  and  $S_{22}$  lines are very close and in order to have a better view on their values, they are displayed separately on the graph in Fig. 4 (b) in narrower frequency range, 600 MHz - 1400 MHz. That frequency band is still wider than frequency band of interest. On both graphs markers are positioned on the boundaries of frequency range in order to give a better overview of the parameter values on the cut-off frequencies (700 MHz and 1300 MHz).

As it is displayed in Fig. 4, excellent results are achieved. All *S* parameters meet essential criteria over the specified frequency range and it can be concluded that satisfactory performances are obtained. Parameter  $S_{21}$  is above 21 dB over the whole frequency range which is of interest. Moreover, it is greater than 30 dB in 700-1000 MHz frequency band. Impedance matching is also good at

both ports,  $S_{11}$  and  $S_{22}$  have value lower than -30dB. The amplifier isolation is also great, the parameter  $S_{12}$  value is lower than -50 dB. The average noise figure of LNA is around 1.5 dB which is excellent for one three-stage wideband amplifier. Also, it was shown that source degeneration technique can be implemented by microstrip lines instead of lumped elements.

# V. CONCLUSION

A three-stage wideband LNA intended for the LTE system has been proposed. The common-emitter topology is implemented in each stage of amplifier. The emitter degeneration technique with microstrip lines is employed as noise-cancelling technique. Although, each of transistors has noise figure around 1 dB, a great overall noise figure is achieved. An ultra-high gain is managed, even though emitter degeneration technique is implemented. Also, wideband input and output impedance matching are provided. It can be concluded that proposed amplifier is completely appropriate for the current LTE requirements.

The LTE system expects further improvements. Accordingly, all devices intended to LTE systems must enhance their performances. In the light of that trend, further research will be directed to developing low noise amplifiers and other devices of the LTE receiver front-end with improved performances including better linearity, simplicity, low cost and higher energy efficiency.

## ACKNOWLEDGEMENT

The results presented in this paper were obtained during the research in the following projects: Erasmus Mundus EUROWEB project funded by the European Commission and III43012 project funded by the Ministry of Education and Science of the Republic of Serbia.

# References

[1] Ajay R. Mishra, "Fundamentals of Cellular Network Planning and Optimisation – 2G/2.5G/3G...Evolution to 4G", John Wiley & Sons Ltd, 2004.

[2] Biswas, I.; Deka, A.J.; Bose, S.C., "Design of a 2.3 GHz Low Noise Amplifier for WIMAX pplications", Devices, Circuits and Systems (ICDCS), 2012 International Conference on , vol., no., pp.105,109, 15-16 March 2012.

[3] Hoai-Nam Nguyen; Viet-Hoang Le; Ki-Uk Gwak; Jeong-Yeol Bae; Seok-Kyun Han; Sang-Gug Lee, "Low power, high linearity wideband receiver front-end for LTE application", Advanced Communication Technology (ICACT), 2011 13th International Conference on, vol., no., pp.640,643, 13-16 Feb. 2011.

[4] Kuang-Hao Lin; Tai-Hsuan Yang; Jan-Dong Tseng, "A low power CMOS receiver front-end for long term evolution systems", SoC Design Conference (ISOCC), 2012 International, vol., no., pp.439,442, 4-7 Nov. 2012. [5] Staudinger, J.; Hooper, R.; Miller, M.; Yun Wei, "Wide bandwidth GSM/WCDMA/LTE base station LNA with ultra-low sub 0.5 dB noise figure", Radio and Wireless Symposium (RWS), 2012 IEEE, vol., no., pp.223,226, 15-18 Jan. 2012.

[6] Ke-Hou Chen; Jian-Hao Lu; Bo-Jiun Chen; Shen-Iuan Liu, "*An Ultra-Wide-Band 0.4–10-GHz LNA in 0.18-μm CMOS*", Circuits and Systems II: Express Briefs, IEEE Transactions on , vol.54, no.3, pp.217,221, March 2007.

[7] Hidayov, O.; Il Hoon Jang; Seok Kyun Han; Sang-Gug Lee; Cartwight, J., "A wide-band CMOS low noise

amplifier for LTE application," *Intelligent Radio for Future Personal Terminals (IMWS-IRFPT), 2011 IEEE MTT-S International Microwave Workshop Series on*, vol., no., pp.1,3, 24-25 Aug. 2011.

[8] Christopher Cox, "*An Introduction to LTE* ", John Wiley&Sons Inc, Feb 28, 2012.

[9] Yichuang Sun, "Wireless communication Circuits and Systems", The Institution of Electrical Engineers, 2004.[10] www.nxp.com

[11] www.rogerscorp.com/acm/products/50/TMM-10i-Laminates.aspx

# Testing Capacitors' Hard Defects in Notch SC Filters Using the Oscillation Method Miljana Milić and Vančo Litovski

*Abstract* - The possibilities for applying the oscillation method to testing Switched Capacitor biquad Notch filter cells with high Quality factor are analyzed in this paper. When applying this method, many surrounding conditions should be met in order to properly test the circuit and to obtain a sustainable and stable oscillations. After solving these problems we have created a fault dictionary that reflects the mapping of hard defects of capacitors into the circuit response. Simulations in LTspice program show that this testing concept is feasible and acceptable for the chosen class of filter cells, especially when we bear in mind all the advantages of the oscillations method.

Keywords - SC filters, Oscillation method, Analogue circuit testing.

## I. INTRODUCTION

Electronic circuit testing is a very important phase of the electronic circuit design. It requires a lot of time, money, and resources. In order to confirm the correct functioning of the design, one should find the way to check whether the circuit response fits into the definition of the correct functioning [1]. Usually, the main activities in preparation of the testing process, follow two main steps:

- 1. To establish an input signal (its waveform) which will make the responses of the fault free (FF) and faulty circuit to differ, and
- 2. Among all such signals, to choose the one that enables fastest and cheapest testing.

Only the correct designs can qualify for the correct project. One of the important aspects of the circuit design is the implementation of a concept Design for Testability (DFT). Second, more important aspect, is the test signal synthesis. A designer here gets a task to generate signal that will be applied to the inputs of the circuit to be tested DUT (Device Under Test), as well as to create a list of the required FF responses which will be compared with those obtained during the DUT testing.

At the other hand, problems that occur during analogue circuit testing are numerous. They arise from the lack of accessible internal circuit nodes, nonlinearities in the circuit, presences of different noises, parameters' variations and others [2].

An important tool that helps solving many of these problems is the fault simulator. In order to successfully prepare a test for a particular DUT, it is necessary to define the set of most probable defects, to describe their models and to embed them into the circuit description. The final result of this procedure should be a Fault dictionary, which can later be used for both testing and diagnostics [3], [4]. One of the crucial types of electronic devices are filters. They are inevitable part of every modern electronic system. Having in mind the requirements for the small circuit size, ability of integration on a chip, and reduction of the design costs, the best solution for the analog filter implementation is the SC technique.

The circuit realized using capacitors, resistors, and operational amplifiers have many drawbacks such as large components tolerances, which can affect the accuracy of the functions that should be performed. On the other side, the accuracy of the SC realized circuits is determined by the accuracy of the capacitances' ratio in it. SC technique represents a very smart application of the switching at the small capacitances on the chip, in order to get the same behaviour and functionalities as large resistors in an MOS integrated circuit. These resistors would, in non SC technique, occupy large chip areas [5], [6].

The main advantages of the SC technique are [7]:

- compatibility with the standard CMOS technology,
- high timing constant accuracy,
- good linearity,
- good temperature characteristics.

Its disadvantages are:

- bad influence of the clock signal,
- the requirement for non-overlapping clock signals,
- bandwidth limitation due to the use of nonoverlapping clock.

One possible solution of the problem with SC filters testing, is the oscillation method. By applying this method, a filter to be tested is converted into an oscillator, by establishing a positive feedback [2], [8]. Doubts about using the frequency dimain analysis or time domain analysis are avoided here. Since we have a filter cell, the most proper analysis domain would consequently be the frequency domain [1]. That, however, would impose a number of new doubts about the frequency, amplitude and phase of the test signal. Similarly, the analysis in a time domain also throws a bunch of questions regarding the shape of the input test signal [1].

Miljana Milić is with the Department of Electronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia,

E-mail: miljana.milic@elfak.ni.ac.rs.

Vančo Litovski is with the Cluster of Advanced Technologies, Bul. Svetog Cara Konstantina 80-86 18000 Niš, Serbija, E-mail: vanco.litovski@elfak.ni.ac.rs.

When oscillation method testing concept is used, there is no need to perform test signals sinthesys, since an oscillator only needs a power supply to maintain stabile oscillations. A special feature of the OBT is that it covers all defects in the circuit. Namely, in the structural test synthesis, it is necessary to generate one test signal for each potential defect of the circuit. With the OBT, it is expected that effects of all potential defects influence the oscillators output signal. This dramatically reduces the test engineers' work.

Since during the OBT, only one signal is observed, that is the output filter/oscillator signal, the observability is always guarantied. Filter's output is always accessible, and by observing it one can conclude if the circuit oscillate and at what frequency.

During the theoretical introduction and the development of the OBT [5], [6], as well as during its development and application to the SC filters [8], [9], one fact was constantly being ignored. Namely, the oscillation frequency of the obtained testing oscillator is normally above the operational amplifier's (embedded in the filter) cut-off frequency (3dB). It was by default assumed that the opamps have an infinite gain, while its phase shift is neglected. It was shown, however, that these approximations are not justified and the theoretical data (oscillation frequency) obtained in that way are not realistic [10].

In the following sections principles of the oscillation testing method will be described first. After that, it will be explained in more detail the functionality and the implementation of the active SC Notch filter with a high Quality factor. Principles of its testing will then be further analyzed. Application of the OBT was verified by simulation in LTspice environment. The aim of the research was to establish the OBT environment for the CUT. As the result, we have obtained stable oscillations of the output signal, and then the circuit was tested for presence of catastrophic defects in capacitors. With the result of the defects modelling and simulation, we have obtained a Fault dictionary which is shown in the last sections of the paper, where experimental results and conclusions are given.

#### II. OSCILLATION BASED TESTING

The idea of using astable behaviour of the circuit with a positive feedback is not so old [8]. The first step in its implementation is to transform CUT into an oscillator. By measuring the oscillation frequency, one can determine whether the circuit is faulty or fault free.

In this way CUT is transformed into a signal generator. The main advantage of such an approach is that no input signal is required. Instead, observing only a few periods of obtained output signal is enough to determine their duration. In general, this method is applicable to analogue and mixed circuits, and in its first step involves the decomposition of the complex system under test (SUT), to simpler blocks that can be tested separately [9]. Fig. 1 shows the basic building blocks for the OBT technique. During the second, testing phase, particular blocks under test (BUT-s) are converted into oscillators. The third phase of OBT involves measuring their oscillation frequences, and their comparison to the FF topology measurement results, in the final phase.

The obtained signal waveforms should now be analysed, which usually involves frequency measurements, but can also include the DC value measurement and analysis of the harmonic distortions. Since the obtained signals are somehow standardized (sinusoidal shapes or the arrays of pulses), these analysis can be standardized regardless of the block to be tested [2].



Although looking simple at the first site, the implementation of this method is limited by the possibility to convert the nominal circuit into an oscillator, where the presence of the defect reflects the oscillation frequency. This process is difficult to systematize, and depends solely on the experience and the creativity of the designer. Principles of oscillator design cannot easily be applied for the OBT implementation. Goals of designing OBT oscillators are not stable frequency and amplitude of the generated sinusoidal signal, but they are created so that the amplitude and frequency of oscillation (not necessarily proper sinusoid) be as sensitive as possible to the presence of defects in the circuit, whatever nature they were.



Fig. 2. High Q Biquad Notch SC filter cell



Fig. 3. The Biquad Notch SC filter with high Q factor in LTspice

In order to cover all defects with the test, it is sometimes necessary to include some other measurements (for example of the supply current in CMOS circuits), or analyze some other parameter of the response.

# III. ACTIVE SC NOTCH FILTER CELL WITH HIGH QUALITY FACTOR

Fig. 2 shows the topology of the universal second order SC filter cell. This topology has the ability of coefficient modification, that is capacitors' variation in order to realize all four types of signal filtering: HP, LP, BP, and BS.

With a proper choice of the coefficients the realized circuit performs Notch, that is Band Stop filtering of the input signal [11]. In our particular case the filter's specifications are: central cut-off frequency  $f_0=1$ kHz, and quality factor Q=10. Beside this, we must choose the frequency of the two-phase, non-overlaping clock signal to be 100kHz, and capacitances  $C_1$  and  $C_2$  to have values of 1pF. The filter's transfer function can be represented as (1):

$$T(s) = \frac{V_{out}(s)}{V_{in}(s)} = -\frac{K_2 s^2 + K_1 s + K_0}{s^2 + \frac{\omega_0}{Q} s + \omega_0^2}.$$
 (1)

Coefficients that determine values of capacitances are calculated according to equations (2-6):

$$\alpha_1 = \frac{K_0 T}{\omega_0} \tag{2}$$

$$\alpha_2 = \left| \alpha_5 \right| = \omega_0 T \tag{3}$$

$$\alpha_3 = \frac{K_1}{\omega_0} \tag{4}$$

$$\alpha_4 = \frac{1}{Q} \tag{5}$$

$$\alpha_5 = K_2. \tag{6}$$

In order to implement Notch filtering, one should choose  $K_1=K_2=0$ , and  $K_0=(3 \cdot \omega_0)^2$ . The frequency response of such a filter cell is shown in Fig. 4. The notch frequency, as seen from this figure is 1kHz.



amplitude, and the doted is the phase characteristic.

LTspice shematics [12] of filter's implementation, after it is transformed into an oscillator, is presented in Fig. 3. The switches are modeled to have open resistance of  $1M\Omega$ , and closed  $1\Omega$ . For this purpose LTC6078 opamps are used from the LTspice CMOS library.

# IV. OBT NOTCH TESTING

In order to apply the OBT method to Notch filter testing, the oscillator output signal first have to be

| Defect | Defects                | f <sub>osc</sub> [kHz] | Amplitude | Phase     |
|--------|------------------------|------------------------|-----------|-----------|
| number |                        |                        | [ uB]     | [degrees] |
| 0      | FF                     | 654                    | -19.78    | -130.37   |
| 1.     | $\alpha_2 C_1$ - open  | 636                    | -22.51    | -171.56   |
| 2.     | $\alpha_2 C_1$ - short | 49.87                  | -72.33    | 55.61     |
| 3.     | $\alpha_4 C_1$ - open  | 654                    | -22.48    | 177.34    |
| 4.     | $\alpha_4 C_1$ - close | 0.969                  | -48.87    | -167.34   |
| 5.     | $\alpha_1 C_1$ - open  | 653.46                 | -19.43    | -82.35    |
| 6.     | $\alpha_1 C_1$ - close | 2.298                  | -29.33    | -159.05   |
| 7.     | C <sub>1</sub> - open  | 1.2                    | -23.63    | -154.27   |
| 8.     | $C_1$ - close          | 16.85                  | -75.70    | 88.98     |
| 9.     | $\alpha_5 C_2$ - open  | 87.06                  | -65.22    | 29.74     |
| 10.    | $\alpha_5 C_2$ - close | No oscillations        | 3V DC     |           |
| 11.    | C <sub>2</sub> - open  | 654                    | -18.61    | -150.44   |
| 12.    | C <sub>2</sub> - close | 29.12                  | -67.65    | 137.84    |
| 13.    | $\alpha_6 C_2$ - open  | 2.22                   | -5.55     | -50.28    |
| 14.    | $\alpha_6 C_2$ - close | 21.90                  | -10.08    | 121.96    |

TABLE I FAULTS DICTIONARY

amplified and stabilized, before it is brought back to the circuit's input. This is shown in Fig. 3. An additional amplifying stage is introduced in the feedback loop. The gain of the additional amplifier is 2, and the achieved voltage levels are  $\pm 300$  mV. When we deal with the FF filter, its oscillation frequency is 654Hz. Very high influence of all opamps' nonidealities can be noticed here. Namely, in case of using ideal opamps, oscillations would occur at the notch frequency of the filter that is 1kHz [10].

When defects are inserted, tree situations can happen: oscillation at the frequency of the FF circuit, change of the oscillation frequency in compare with the FF circuit, and the apsence of the oscillations. In the first case, presence of the defect cannot be detected by only observing the oscillation frequency, and some additional measurement or output signal parameter analysis is required (for example, harmonic distorsion, DC value or phase etc.). In the second and third case, presence of defects in the circuit is much easier to detect.

Fault simulation has key role during the test synthesis. For the Notch SC BS filter, it is recomended to use some analog circuit simulator. Unfortunately, too large number of different circuit simulators use stable numerical integration formulae in solving differential equations in the time domain, such as the Euler's backward rule, or some group of higher order rules such as Gear's formulae are [10], [13]. With a stable approximation rule one cannot simulate an astable circuit, such the oscillator is [14], [15]. Because of this limitation, we have chosen to use LTspice simulator, since it offers trapezoid and modified trapezoid rule for the time derivative approximation.

#### Fault simulations

There are two groups of defects in analog circuits: hard that is chatastrophic, that change the circuit topology, and soft, that is, parametric, that affect just some parameter value within the circuit. During the testing and OBT oscillator simulation, hard defects of the capacitances were checked. The aim of this research is to create the OBT environment that will ensure stable oscillations. By simulating the chatastrophic defects, we have proven the theoretical assumptions about the effects of the defects to the oscillation frequency of the OBT and to additional parameters of the circuit's response.

Prior to the simulation it is necessary to set the initial conditions in order to enable the circuit to oscillate.



Table I shows the simulation results for hard defects. It gives the oscillation frequencies in the absence and in the presence of the particular capacitance defects. First row corresponds to the FF response analysis, while others refer to capacitances' hard defects. It can be noticed that in some cases of the hard defects, the oscillation frequency is very similar to the FF one. If we only observe this parameter,

some defects in this case, can be pronounced unrecognizable. This points to the fact that some other response parameter should be measured and observed too.

Table I also gives some important parameters of the output signal FFT analysis. We have chosen to observe the amplitude and the phase of the output signals using its spectre, which are obtained after the FFT analysis. The spectre of the output signal for the FF circuit is shown in Fig. 5.

Additional results indicate that these parameters of the circuit's response can be used during testing not only for rejecting bad devices, but for diagnostics and recognition of the particular defects in the circuit, too [16], [17].

## V. CONCLUSION

In this paper we have shown an efficient implementation of the OBT method for testing active SC Notch biquad filter cells. By measuring the oscillation frequency and observing the amplitude and phase of the first harmonic for the output signal, we have achieved 100% defect coverage. In this study we took into account the influence of the real parameters of the opamp model. The applied testing method does not require development and application of any input test signal, and all measurements are performed for only one signal at only one test-point. By applying additional simple logic, this technique can be efficiently used as part of some BIST, Analogue Scan or DFD solution. Future research will be directed to analysis of effects of switching defects; hard and soft/ideal or real.

#### ACKNOWLEDGEMENT

This research was partly funded by The Ministry of Education and Science of Republic of Serbia under contract No. TR32004.

#### References

- Litovski, V., *Basics of Electronic Testing*, 1<sup>st</sup> ed., Univ. of Niš, Faculty of Electronic engineering, Niš, Serbia, 2009. In Serbian.
- [2] Milić, M., and Litovski, V., "Application of the oscillation based method to a bandstop filter", Proceedings of the VIII INDEL symposium, Banja Luka, Bosnia and Hercegovina, 4-6. November, 2010, pp. 100-104. In Serbian.
- [3] Stošović, M. A., Milić, M., Zwolinski, M., and Litovski, V., "Oscillation-based diagnosis using artificial neural networks based inference mechanism", Computers and electrical engineering, vol. 39, no. 2, December, 2012, pp. 190-201.

- [4] Stošović, M. A., Milić, M., and Litovski, V., "Analog Filter Diagnosis Using the Oscillation Based Method" Journal of Electrical Engineering - Elektrotechnicky, Vol. 63, No. 6, 2012 pp. 349-356.
- [5] Kač, U., Novak, F., "Oscillation Test Scheme of SC Biquad Filters Based on Internal Reconfiguration," J. Electron Test, Vol. 23, No. 6, December, 2007, pp. 485-495.
- [6] Kač, U., Novak, F., "Reconfiguration Schemes of SC Biquad Filters for Oscillation Based Test," Information Technology and Control, Vol. 42, No. 1, February 2013, pp. 38-47.
- [7] Allen, P., and Holberg, D, CMOS Analog Circuit Design, 2<sup>nd</sup> ed. New York, USA: Oxford University Press, 2002.
- [8] K. Arabi, B. Kaminska, "Testing analog and mixedsignal integrated circuits using oscillation-test method," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 16, no. 7, pp. 745-753, 1997.
- [9] Sanchez, H., Vega, V., Rueda, R., Dias, L., Oscillation-Based Test in Mixed-Signal Circuits, 1<sup>st</sup> ed., Kluwer Academic Publishers, 2007.
- [10] Milić, M., Stošović, M. A., and Litovski, V., "Oscillation based analog testing – a case study", Proc. 34<sup>th</sup> Int. Conf. on Information and Communication Technology, Electronics and Microelectronics MIPRO 2011, Opatija, Croatia, Vol. 1, 23-27. May, 2011, pp. 118-123.
- [11] Milić, M., and Litovski, V., "Testing of the parametric defect of an SC cell by the oscillation based method", Proceedings of the LVII ETRAN conference, Zlatibor, Serbia, 3-6. June, 2013, pp. EL2.3. In Serbian.
- [12] www.linear.com
- [13] Litovski, V., Electronic circuit design, Simulation, Optimization, testing and physical design, 1st ed., Nova Jugoslavija, Vranje, Serbia, 2000. In Serbian.
- [14] Litovski, V., and Zwolinski, M., VLSI Circuit Simulation and Optimization. Chapman and Hall, London, 1997.
- [15] Thompson, C., "A study of numerical integration techniques for use in the companion circuit method of transient circuit analysis", ECE Technical reports, Purdue University School of Electrical Engineering, 1992.
- [16] Litovski, V., Stošović, M. A., and Zwolinski, M., "Analogue Electronic Circuit Diagnosis Based on ANNs", Microelectronics Reliability, Vol. 46, No. 8, August 2006 pp. 1382-1391.
- [17] Stošović, M. A., Milovanović, D., and Litovski, V., "Hierarchical Approach to Diagnosis of Mixed-mode Circuits using Artificial Neural Networks", Neural Network World, Vol. 21, No. 2, 2011, pp. 153-168.

# Analog Design Challenges in Advanced CMOS Process Node

# Dejan Mirković, Predrag Petković and Dragiša Milovanović

*Abstract* – This paper deals with problems of porting integrated circuit (IC) designs to new, scaled, process node. A problem arises especially when analog part of the chip has to be transferred. New process nodes provide many high end capabilities e.g. high speed and low power consumption. On the other more and more parasitic and higher order effects comes in to play. Therefore, extensive simulations of standard MOS device are obligated in order to unveil true device behaviour which is crucial in the world of analog IC design. For the characterization purposes Cadence<sup>®</sup> Open Command Environment for Analysis (OCEAN) in conjunction with GNU Octave is exploited. Conclusions regarding design strategies are extracted. Important trade-offs are to be pointed out, as well.

*Keywords* – CMOS Process nodes, analog integrated circuits, simulation, MOS device

#### I. INTRODUCTION

Contemporary submicron processes are primarily focused on improving device characteristics in digital domain. Main motive behind aggressive dimensions and power supply voltage shrinking lies in possibility to obtain higher operation frequency and lower power consumption. Practically, as far as digital circuitry is concerned the most important operation is to efficiently (as fast as possible and with smallest amount of energy burned) turn off and on MOS device (switch). Highest frequency at which single device can operate is defined as unity current gain frequency,  $f_T$ , i.e. when drain and gate current ratio,  $i_d/i_g$ , equals one. This frequency can be easily estimated if one consider common source topology with dominant gate terminal parasitic capacitance,  $C_{GG} = C_{GS} + C_{DS}$  (which for long channel device becomes  $C_{GG} \approx C_{GS}$ ). Since drain current is  $g_m$  times  $V_{GS}$  and  $V_{GS}$  lies across  $C_{GS}$  relation between  $i_d$  and  $i_g$  arises. In saturation (strong inversion)  $C_{GS}$ can be further approximated with  $2C_{ox}W/3$  where  $C_{ox}$  is gate oxide capacitance density. For square low devices  $g_m$ equals  $\mu C_{ox} \hat{W}(V_{gs} - V_{th})/L$ , where  $V_{gs} - V_{th}$  is overdrive voltage,  $V_{ov}$ , and  $\mu$ , W and L stands for carrier mobility, width and length of the MOS device, respectively. Finally, approximated  $f_T$  expression for long channel device is given in (1).

Dejan Mirković, Prerag Petković and Dragiša Milovanović are with the Department of Electronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia, E-mail: {dejan.mirkovic, predrag.petkovic, dragisa.milovanoivc}@elfak.ni.ac.rs.

$$f_T = \frac{g_m}{2\pi C_{GG}} \propto \frac{3\mu V_{ov}}{4\pi L^2} \tag{1}$$

From (1) it is obvious that shorter devices with larger overdrive voltage will operate at higher frequency.

It is already known that dominant load in CMOS technology has capacitive character and that the maximum power consumption occurs when charging/discharging that capacitive load,  $C_{load}$ . Maximum switching current,  $I_{Dmax}$ , can be related with switching frequency,  $f_{sw}$ , and worst case voltage change across capacitive load,  $\Delta V_{Dmax}$ , as given in (2).

$$P_{sw} = I_{D\max} \Delta V_{D\max} \propto 2\pi f_{sw} C_{load} V_{DD}^2 .$$
 (2)

Worst case voltage change corresponds to entire power supply swing from 0V to  $V_{DD}$ . Therefore, (2) unveils motive to keep power supply as low as possible in order to reduce dynamic power consumption.

All these properties, towards deep submicron process nodes strive, only help digital operation. Analog IC circuitry often has exactly the opposite requirements. Sometimes one needs to sacrifice power consumption to fulfil noise and speed constraints like in mixed-signal circuits. In other cases such as low power RF applications week inversion region is used in order to accomplish high speeds and keep low power consumption. Practically there is always a trade off between several quantities (power, speed, delay, noise, signal swing etc.). Power supply and dimension shrinking in submicron processes only makes things worse and puts additional design challenge for analog IC design. As a result effects like leakage mechanisms (reverse biased junction current, gate induced drain leakage, direct gate tunnelling, sub-threshold leakage) which boosts up static power consumption and drain induced barrier lowering, lower gate-oxide breakdown voltage etc. arises [1], [2], [3]. Aside of these effects,  $V_{th}$  do not scale linearly with  $V_{DD}$  [4]. This phenomenon of drastic voltage headroom reduction causes the main anlaog design challenge.

Prime goal of this paper is to provide overview of deep submicron, 65nm, process and emphasize differences comparing to 350nm process. Besides, 65nm nod will be the target technology for the new Integrated Power Meter (IMPEGIII) chip developed in LEDA laboratory [5]. Deep submicron 65nm is compared with 350nm process from standpoint of MOS deice capabilities. Even it is considered obsolete and replaced with younger, 180nm and 90nm nodes; it is still favourable for analog and mixed-signal design. Previous version of IMPEG chip was implemented in 350nm technology hence the reason for choosing it for comparison.

In the following section some guidelines about important device characteristics and how to obtain them will be given. Environment used to automate simulation process will be disused. Then, in the third section, 350nm and 65nm process nodes MOS device will be compared through simulation results. Conclusion will summarise important findings obtained from previous sections.

# II. DEVICE CHARACTERISTICS

This section will cover major device characteristics and provide insight to appropriate test bench circuits used to extract them. More detailed information covering device models, results from exhaustive corner analyses and measurements of a single device are contained in proprietary Process Design Kit (PDK) documentation. However this documentation does not provide relation between key design parameters (e.g. gain, bandwidth) and device dimensions. Therefore it is necessary to examine device behaviour when applied in real circuit environment (e.g. with feedback). For sake of simplicity all further discussed circuits are for the NMOS device and can easily be adjusted to apply on PMOS.

## A. Intrinsic small signal gain

Maybe the most important design parameter of the analog circuit is its small signal voltage gain. When a new process node shows up in the market designers are usually interested in "how much gain the smallest device can provide". Using the small signal model of the MOS FET (3) is obtained:

$$A_{v0} = g_m r_0 = \frac{2V_E L}{V_{ov}},$$
 (3)

where  $g_m$  and  $r_o$  are small signal parameters, transconductance  $(2I_D/V_{ov})$  and resistance  $(V_EL/I_D)$ , respectively. Here  $I_D$  stands for transistor bias current and  $V_E$  represents process dependent parameter expressed in volts over meters [6]. This process parameter can be thought of as equivalent for the Early voltage of bipolar transistor. From (3) it is obvious that the increasing device length increases gain, at least at DC. Since  $V_E$  parameter depends strongly on process it cannot be accurately estimated. Therefore the simulation is standard way for small signal gain extraction. For this purpose test bench circuit in Fig. 1 is used.

This circuit simulates real working environment of the device. DC current through device is always determined by

some external bias circuit and mirrored to  $I_{bias}$ . This current sets appropriate  $V_{ov}$  and consequently  $V_{GS}$ . In order to keep device in linear region and at the same time sweep DC voltage across it positive feedback is established through an ideal opamp. This way  $V_{DS}$  is forced to track  $V_{ref}$  change while preserving bias point set by  $I_{bias}$ . The whole structure practically behaves similarly to diode connected device with fixed bias current.



Fig. 1 Test bench circuit for  $A_{\nu\theta}$  simulation

Circuit in Fig. 1 is proved to be quite popular test bench since it requires smallest number of sweep parameters.

Two test cases are preformed. First  $V_{ref}$  is swept with  $I_{bias}$  fixed in order to examine  $A_{v0}$  versus  $V_{ov}$ . In the second case  $I_{bias}$  is swept while  $V_{ref}$  remains constant. Sweep simulations are repeated for different  $L_n$  values. These tests extract  $A_{v0}$  behaviour for different device lengths and bias conditions.

#### B. Composite figure of merit

Besides small signal gain, equally important device parameter is the unity current gain frequency,  $f_T$ . It determines how fast device can operate at given bias point.

Since voltage headroom is drastically reduced in deep submicron processes it is inevitable that some devices will be forced to operate at the edges of the strong-weak inversion region. Not infrequently happens that only sub threshold reign is used [7]. Therefore drain current square low dependence is no more valid. This implies that hand calculations are irrational to use when designing device in submicron process nodes. The common method for mapping design parameters to transistor bias points and consequently dimensions is  $g_m/I_D$  curve. This ratio is often called device efficiency because it tells how much transconductance per bias current can be obtained. Using this measure one can ensure not to spend too much current (energy) for required transconductance. Therefore power efficient design is ensured.

In order to find optimal bias point of the device so called Composite figure of merit (CFOM) should be extracted. Expression for CFOM is  $f_T \times g_m/I_D$  and its maximum gives optimum bias point.

For extraction of all these parameters diode connected device is used. Fig. 2 illustrates test bench circuit. Here sweep parameter is  $V_d$ . Since  $V_{DS}$  equals  $V_{GS}$  saturation is ensured. Again sweep simulation is repeated for different channel lengths.


Fig. 2 Test bench circuit for CFOM simulation

#### C. Noise

Next aspect of importance is the noise. Reduced supply voltage implies smaller signal swings. As the signal amplitude become smaller the noise influence increases. Test bench circuit is shown in Fig. 3.



Fig. 3 Test bench circuit for noise performance simulation

Again diode connected device is exploited but with fixed DC current. Since ideal current source ensures infinite load impedance only device noise influence is present. Noise analysis is performed for different device lengths.

For simulation Cadence<sup>®</sup> Spectre simulator is used. Since a number of parametric sweep simulations are required the procedure is automated using SKILL scripting language in OCEAN under Cadence Design System<sup>®</sup> (CDS) [8]. This way call of a single script performs all necessary simulations. Even CDS contains plotting programs usually they do not provide enough degree of freedom and control over plotting process. Therefore GNU Octave is used for data presentation. GNU Octave is an open-source alternative to MATALB<sup>®</sup>, proprietary programming framework for numerical mathematics and data analysis [9]. Both platforms provide a large number of useful mathematic operations and functions, support interactive and batch mode, and run under UNIX/Linux operating systems. All this makes them compatible and attractive to be combined into one unit.

The subsequent section will present results obtained using described software conjunction for simulation of aforementioned test benches. The procedure will be implemented on two process nodes.

#### **III. SIMULATION RESULTS**

As announced in Section I transistor performances of two process nodes will be compared. Transistors are from the same, Taiwan Semiconductor Manufacturing Company (TSMC), manufacturer. Both types i.e. NMOS and PMOS devices are examined. Since both showed the similar differences in terms of  $A_{v0}$ ,  $f_T$ ,  $g_m/I_D$  and CFOM in two different processes only results for NMOS are presented. Exception is comparison from noise performance point of view. For this case both device types are presented. Three channel lengths are chosen as shown in Table I.

TABLE I DEVICE LENGTHS

| DEVICE EEXOTING |             |      |      |  |
|-----------------|-------------|------|------|--|
| Process         | Length [µm] |      |      |  |
| 350nm           | 0.8         | 1.6  | 3.2  |  |
| 65nm            | 0.24        | 0.48 | 0.96 |  |

To minimise small channel effect twice the minimal for 350nm and three times the minimal for 65nm process length is adopted as a start value for device length. For all cases device width is chosen to be ten times the length. These are the common proportions for the smallest device in analog IC application.

Firstly, small signal voltage DC gain is examined. Fig. 4 shows this parameter versus output voltage,  $V_{DS}$ , for 250  $\mu$ A fixed bias current,  $I_{bias}$ .

It is important to mention that this is a purely DC measure and therefore such large values for the gain. First thing to notice from Fig. 4 is different power supply arrangement, 3.3V for 350nm vs. 1.2V for 65nm. E.g.  $V_{DS}$  bias of 1.2V and two times the minimal transistor length, 800nm, provides a gain of about 100 in 350nm process. This is obtained for bias of about a third of the  $V_{DD}$ . To achieve the same gain in 65nm process one needs to use nearly a full power supply i.e. 1.2V and at least 16 times larger length than minimal, 960nm.





Fig. 4 Small signal DC gain vs. drain source voltage

Fig. 5 presents the small signal DC gain versus different bias currents and fixed reference voltage,  $V_{refs}$  of  $V_{DD}/2$ .



Fig. 5 Small signal DC gain vs. drain bias current

It is obvious that increasing bias current does not solves the gain reduction in sub micron process nodes. Therefore, besides standard cascoding, advanced design techniques such as gain boosting, bootstrapping and current cancelation should be used [6].

Unfortunately threshold voltage  $V_{th}$  also changes with transistor dimensions. For these test cases simulation results showed that  $V_{th}$  ranges from 0.57÷0.61V for 350nm and 0.47÷ 0.51V for 65nm process. This results in absolute,  $\Delta V_{th}$ , change of 40mV.  $V_{th}$  fluctuation in sub micron process becomes influent because of reduced voltage headroom. That is why it is not wise to use it as reliable design parameter.

Fig. 6 shows  $f_T$  dependence on overdrive voltage. It is clear that new sub micron processes provide higher speed for nearly an order of magnitude. Certainly, this is only valid for small device lengths. Increasing length for the same bias conditions reduces operating frequency as (1) suggests.



Fig. 6 Unity current gain frequency vs. overflow voltage

On the other choosing larger  $V_{ov}$  increases  $f_T$ . This fact should be used with caution because larger  $V_{ov}$  reduces

device efficiency as shown in Fig. 7.

Fig. 7 shows that device efficiency is the only reliable design parameter i.e. it is relatively independent on device dimensions and other effects (short channel, carrier velocity saturation etc.). Shape of the curve is the same for both process nodes. Value of  $g_m/I_D$  at sub threshold edge,  $V_{ov} = 0V$ , is 15 for 65nm and it is lower comparing to 20 for 350nm. But submicron node provides higher efficiency in week inversion region then 350nm node, 32 versus 27. Hence reason to exploit this region of operation in sub micron processes.

The best way to establish optimal  $V_{ov}$  is to look at CFOM. This parameter is presented in Fig. 8.



Fig. 7 Efficiency vs. overflow voltage

CFOM has its maximum for about 0.2V of overdrive voltage for both process nodes. It is important to notice that this value did not scale down with  $V_{DD}$  at all. Let us assume that one half of the  $V_{DD}$  dedicates to signal range and the other half to transistor bias. Older, 350nm node, will allow about eight devices in the cascode while newer, 65nm node, only three. Therefore, sub threshold region of

operation and consequently advanced design techniques are inevitable in submicron process nodes especially when low power is required.



Fig. 8 Unity current gain frequency, Efficiency product vs. overdrive voltage

Finally noise performance is presented in Fig. 9. Here comparison between NMOS and PMOS devices is given. Bias current is fixed at 250  $\mu$ A. In both types of device same W/L = 10 is used. For 350nm nod NMOS device has a larger flicker noise and corner frequency then PMOS device. This is quite expected since both NMOS and PMOS devices uses N type polysilicon gate. This prevents forming the channel at the surface directly under the gate oxide in PMOS devices. Practically PMOS devices have buried channel with smaller possibility of random trapping/releasing carriers at the oxide/channel surface. This mechanism is known as the main source of flicker noise in MOS devices [10].

However 65nm node exhibits opposite behaviour. Actually NMOS device shows better performance then PMOS device. This is because both types of the device have surface channels. This means that PMOS uses P type and NMOS uses N type polysislicon gate. Therefore there is no advantage in favour of PMOS over NMOS. However this is very technology dependent property hence important to examine. Table II summarises total noise contribution for both process nodes.



b) For 65nm

Fig. 9 Noise performance

TABLE II TOTAL NOISE CONTRIBUTION

| Frequency |     | Integrated noise power [V <sup>2</sup> ] |         |       |         |
|-----------|-----|------------------------------------------|---------|-------|---------|
| 1Hz ÷ 100 | GHz | NMOS                                     |         | PMOS  |         |
|           |     | 1/f                                      | thermal | 1/f   | thermal |
| Length    | 0.8 | 1.46n                                    | 0.14µ   | 0.37n | 0.16µ   |
| 350nm     | 1.6 | 0.29n                                    | 56.2n   | 73.1p | 58.1n   |
| [µm]      | 3.2 | 62.8p                                    | 16.4n   | 15.5p | 16.3n   |
| Length    | 240 | 3.23n                                    | 59.9n   | 13.2n | 0.106µ  |
| 65nm      | 480 | 0.73n                                    | 44.8n   | 2.74n | 58.8n   |
| [nm]      | 960 | 0.16n                                    | 17.2n   | 0.68n | 17.6n   |

Again, looking at results in Table II, PMOS provides better noise performance than NMOS in 350nm and vice versa in 65nm process nod.

#### **IV. CONCLUSION**

This paper examined problems and challenges concerning analog IC design. General conclusion is that sub threshold region of operation and advanced design techniques are almost obligated in sub micron process nodes. It was also shown that some previously acquired rules of thumb from older process nodes such as one concerning noise performance are no more valid. Therefore it is of crucial importance to examine device behaviour before considering to port design to new process nod. It is clear that at least analog part of the design has to be designed nearly from scratch.

#### ACKNOWLEDGEMENT

This work is funded by Serbian Ministry of Education, Science and Technological Development within the project No. TR 32004, entitled: "Advanced technologies for measurement, control, and communication on the electric grid.

#### REFERENCES

- [1] Roy, K., Mukhopadhyay, S., Mahmoodi-Meimand, H., "Leakage Current Mechanisms and Leakage Reduction Techiques in Deep-submicrometer CMOS circuits", Proceedings of the IEEE, Vol. 91, No. 2, Feb., 2003, pp. 305-327.
- [2] Jovanović, B., Damnjanović, M., "Low Power Techniques for Leakage Power Minimization", Proceedings of LIV ETRAN Conference, Jun., 2010., pp. EL3.4-1-4.
- [3] Borkar S., "Design Challenges of Technology Scaling ", IEEE Micro, Vol. 19, No. 4, Avg., 1999, pp. 10-16.
- [4] Gonzalez, B., Gordon, M., "Supply and threshold voltage scaling for low power CMOS", IEEE Journal of Solid State Circuits, Vol. 32, Issue 8, Aug., 1997, pp. 1210-1216.
- [5] Litovski, V., et al., "IMPEG 2", Technical Solution laboratory prototype TR 6108B, Apr., 2008.
- [6] Sansen, W., "Analog Design Essentials", Springer, Dordrecht, Netherlands, 2006.
- [7] Dokić, B., Pajkanović, A., "Low Power CMOS Sub-Threshold Circuits", Int. Simp. μPRO2013, May, 2013.
- [8] "OCEAN Reference", Cadence Design Systems, Inc., San Jose, 2004.
- [9] http://en.wikipedia.org/wiki/GNU\_Octave, 2013.
- [10] Reimbold, G., "Modified 1/f Trapping Noise Theory and Experiments in MOS transistors Biased from Weak to Strong Inversion – Influence of interface states", IEEE Transaction on Electron Device, Vol. ED-31, No. 9, Sep., 1984, pp. 1190-1198.

## SPICE Model of a Linear Variable Capacitance

Miona Andrejević Stošović, Marko Dimitrijević, and Vančo Litovski

*Abstract* - A simple model of the linear capacitance being function of circuit, mechanical or environmental variable will be introduced. That will enable effective SPICE simulation of circuits containing this kind of capacitances such as sensors, actuators, and, generally, circuits with time varying linear capacitances. Illustrative example will be elaborated related to MEMS capacitive pressure sensor.

Keywords - Modeling, MEMS, SPICE Simulation.

#### I. INTRODUCTION

Capacitors with variable capacitances may be grouped into two categories: the ones controlled by their own voltage, and the ones controlled by a variable that may or may not be electrical. In the first case we in fact have a nonlinear capacitance which may be treated as described in [1]. These are out of the scope of this paper. Here we will consider linear capacitors only whose capacitance is properly controlled by a circuit variable or force, pressure, light, temperature or some other environmental variable.

To our knowledge the first capacitor with variable capacitance was patented by Nikola Tesla in 1896 [2]. It was the "vacuum variable capacitor". The variation of the capacitance was achieved by rotating one of the capacitor's plates so changing the overlapping area. This component went into series production in 1942. Much more frequently implemented capacitor of this kind was the one used in the heterodyne to allow for selection of the radio station by the wireless receiver. It was built as "a group of semicircular metal plates on a rotary axis ("rotor") that are positioned in the gaps between a set of stationary plates ("stator") so that the area of overlap can be changed by rotating the axis" [3].

Today we have many versions of capacitors whose capacitance is controlled by some environmental or circuit variable. For example, MEMS pressure sensors are most frequently capacitive [4,5]. In addition, one may use the capacitive properties of some semiconductor components as a light sensor [6] or one may synthesize an electronic circuit where the capacitance will be controlled by some circuit variable [7].

Finally, temperature is always a controlling variable to a capacitance and especially for supercapacitors which are now emerging for everyday use [8]. In any case, for design of a system based on such component one needs a model

Miona Andrejević Stošović and Marko Dimitrijević are with the, University of Niš, Faculty of Electronic Engineering, Aleksandra Medvedeva 14, 18000 Niš, Serbia, E-mail: {miona.andrejevic, marko.dimitrijevic}@elfak.ni.ac.rs.

Vančo Litovski is with NiCAT - Niš Cluster of Advanced Technologies, 18000 Niš, Serbia (vanco.litovski@elfak.ni.ac.rs)

that will accurately encompass the devices properties and, in the same time, will be easily implementable in the most popular electronic circuits simulation program SPICE [9].

In the next we will first expose the problem. Then a solution will be proposed followed by an illustrative example.

### II. MODELLING THE LINEAR CAPACITOR CONTROLLED BY A VARIABLE PARAMETER

The capacitor is introduced into the circuit description by two equations constituting its model no matter what specific properties it has:

$$q_{c} = f(v_{c}, \underline{p}) \tag{1}$$

$$i_c = dq_c/dt.$$
 (2)

Here  $q_c$  is the charge captured by the capacitor,  $v_c$  is the voltage at its terminals, <u>p</u> is a vector of controlling variables,  $i_c$  is the capacitor's current and t is the time variable. The parameter(s), <u>p</u>, may be time varying which means that, implicitly, we presume that the capacitor has a time varying capacitance. The main difference with usual capacitors having time varying capacitance here is in that in our case the capacitance is a controlled (not independent) variable and it will be treated as such.

In the next, for simplicity, we will consider  $\underline{p}$  to be onedimensional, i.e. we will use p.

Modeling of this kind of component was addressed and solved in [1] and implemented in [10,11]. Implementation of that solution, however, needs intervention into the simulator's code what was done with the Alecsis simulator [12] in [10,11]. Here, we will go for a modelling procedure that will be based on circuit elements being normally recognized by the SPICE's input language.

For a linear capacitor (1) may be represented as follows:

$$q_{c} = C(p) \cdot v_{c} = C_{0} \cdot g(p) \cdot v_{c}$$
(3)

where  $C(p)=C_0 \cdot g(p)$  was introduced.  $C_0$  is a conveniently chosen constant. Substitution in (2) leads to the following:

$$i_{\rm c} = C_0 \cdot g(p) \cdot \frac{dv_{\rm c}}{dt} + C_0 \cdot v_{\rm c} \frac{dg(p)}{dp} \cdot \frac{dp}{dt} \,. \tag{4}$$

Note, the dependence C(p) [or g(p)] and its derivative

 $\frac{dC(p)}{dp}$  [or  $\frac{dg(p)}{dp}$ ] are considered known functions

representing the capacitance's properties.

To get a circuit representation of (4) it will be rearranged in the following way:

$$i_{\rm c} = \left(C_0 \cdot \frac{dv_{\rm c}}{dt}\right) \cdot g(p) + \left(C_0 \cdot \frac{dp}{dt}\right) \cdot v_{\rm c} \cdot \frac{dg(p)}{dp}.$$
 (5)

That may be seen as a parallel connection of two current sources:

$$i_{\rm cv} = \left(C_0 \cdot \frac{dv_{\rm c}}{dt}\right) \cdot g(p) \tag{6a}$$

and

$$i_{\rm cp} = \left(C_0 \cdot \frac{dp}{dt}\right) \cdot v_{\rm c} \cdot \frac{dg(p)}{dp}.$$
 (6b)

The first one,  $i_{cv}$ , represents a product of a current of a linear constant capacitance  $C_0$  and a voltage of a nonlinear controlled source g(p). The second represents a product of three quantities: the quantity  $\left(C_0 \cdot \frac{dp}{dt}\right)$  whose nature will be considered later on, the capacitor's voltage  $v_c$ , and the known derivative represented as a controlled source:  $\frac{dg(p)}{dp}$ . It is important to note that p may be a circuit variable or a controlling (excitation) one defined

outside of the circuit. The modelling procedure will have to take that into account by developing two variants of the model.



Fig. 1 Circuit representation of (5) for the case when *p* is controlling (independent) variable

A. p is an independent controlling variable

In this case two controlled sources are available in advance:  $v_i = g(p)$  and  $v_j = \frac{dg(p)}{dp}$ . The quantity  $i_0 = \left(C_0 \cdot \frac{dv_c}{dt}\right)$  will be related to a constant capacitor while  $v_k = \left(C_0 \cdot \frac{dp}{dt}\right)$  will be treated as a voltage. That means that *p* will be considered a current and we will use an inductance of value  $C_0$  to get that voltage.

The complete circuit modelling (5) is depicted in Fig. 1.

*B. p* is a circuit variable Again,  $v_i = g(p)$  and  $v_j = \frac{dg(p)}{dp}$  are available in advance. The quantity  $i_0 = \left(C_0 \cdot \frac{dv_c}{dt}\right)$  will be related to a constant capacitor while  $\left(C_0 \cdot \frac{dp}{dt}\right)$  will be treated depending on the nature of *p*. If *p* is a node voltage,  $i_k = \left(C_0 \cdot \frac{dp}{dt}\right)$  will represent a current. This situation is depicted in Fig. 2.



Fig. 2 Circuit representation of (5) for the case when p is circuit variable - voltage

#### C. p is a branch current

Finally, if p is a branch current the circuit of Fig. 3 may be used.



Fig. 3 Circuit representation of (5) for the case when p is circuit variable - current

#### **III. IMPLEMENTATION EXAMPLE**

To illustrate we will implement the above modeling procedure to a capacitive pressure sensor followed by a sampling circuit as described in [10,11] and depicted in Fig. 4a. The dependence of the sensing capacitance (here denoted as  $C_s$ ) on the outside pressure is depicted in Fig. 4b. It was approximated by a polynomial:

$$C(p) = (8.2623 - 4.799p + 67.256 \cdot p^{2} - 213.97 \cdot p^{3} + 306.2 \cdot p^{4} - 200.83 \cdot p^{5} + 49.709 \cdot p^{6}) [pF]$$
(7)

Least squares approximation was used to get the coefficients in (7). Fig. 4c represents the approximation error.

The iterative process was stopped at  $R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}}$ 

=0.998. Here  $SS_{tot} = \sum_{i=1}^{n} [\hat{C}(p_i) - \overline{C}]^2$  is the total sum of

squares;  $SS_{res} = \sum_{i=1}^{n} [\hat{C}(p_i) - C(p_i)]^2$  is the sum of squares

of residuals based on (7);  $\overline{C} = \frac{1}{n} \sum_{i=1}^{n} \hat{C}(p_i)$ ;  $\hat{C}(p_i)$  are samples taken from Fig. 4b, and n=30.



Fig. 4. a) Capacitive pressure sensing circuit, b) characteristic of the pressure to capacitance conversion device, and c) the approximation error

For verification, a sinusoidal excitation was used:  $p(t)=0.7+0.7 \cdot \sin(\omega t)$  [kPa], where  $\omega=2000 \cdot \pi$  s<sup>-1</sup>.  $C_0=C(p_0=0)=8.2623$  pF. Fig. 5 represents the simulation results based on the use of Fig. 1 as the circuit model. The following parameters were used:  $V_{ref}=0.1V$  and  $C_r = C_0$ . The period of the sampling signal was  $T_s=30\mu s$ .



Fig. 5. Output signal obtained by SPICE simulation of the circuit of Fig. 4a.

#### IV. CONCLUSION

The subject of modeling a linear capacitance controlled by some dependent or independent variable was revisited. In our previous research we implemented the model in a way that was asking for intervention into the simulation software's code. Here, we express an idea of how the same model may be implemented using the SPICE input language and SPICE predefined circuit elements, only. The results obtained confirm the feasibility of the idea while asking in the same time for better approximation i.e. approximation that includes both the function and its derivative.

#### ACKNOWLEDGEMENT

This research was partially funded by The Ministry of Education and Science of Republic of Serbia under contract No. TR32004.

#### REFERENCES

- Litovski, V., and Zwolinski, M., "VLSI Circuit Simulation and Optimization." Chapman and Hall, London, 1997.
- [2]http://en.wikipedia.org/wiki/Vacuum\_variable\_capacitor
- [3] http://en.wikipedia.org/wiki/Variable\_capacitor
- [4] Mrčarica, Ž., Litovski, V., and Detter, H., "Modeling and Simulation of Microsystems Using Hardware Description Language", Research Journal on Microsystem Technologies, Vol. 3, No. 2, Feb., 1997, pp. 80-85.
- [5] Juillard, J., Arcamone, J., Arndt, G., and Colinet, E., "Influence of excitation waveform and oscillator geometry on the resonant pull-in of capacitive MEMS

oscillators", Published in: Design, Test, Integration and Packaging of MEMS/MOEMS (DTIP 2013), Barcelona, Spain, 2013, http://hal.archivesouvertes.fr/docs/00/83/05/33/PDF/DTIP auteur.pdf.

- [6] Haidar, S., " Light-controlled oscillator uses solar cell junction capacitance", EDN Network, August 8, 2013, http://www.edn.com/design/analog/4419444/Lightcontrolled-oscillator-uses-solar-cell-junctioncapacitance.
- [7] Abbadi, M.I. and Jaradat, A.-R. M., "Artificial Voltage-Controlled Capacitance and Inductance using Voltage-Controlled Transconductance", World Academy of Science, Engineering and Technology, International Science Index 20, Vol. 2, No. 8, pp. 782 - 786.
- [8] Kötz, R., Hahn, M., and Gallay, R., "Temperature behaviour and impedance fundamentals of supercapacitors", J. of Power Sources, Vol. 154, No. 2, 2006, pp. 550–555.
- [9] Nagel, L. W, and Pederson, D. O., "SPICE (Simulation Program with Integrated Circuit Emphasis)", Memorandum No. ERL-M382, University of California, Berkeley, Apr. 1973.
- [10]V. Litovski, M. Andrejevic, M. Zwolinski, "ANN Based Modeling, Testing, and Diagnosis of MEMS," IEEE 7th Seminar on Neural Network Applications in Electrical Engineering, NEUREL 2004, Sept. 2004, Belgrade, pp. 183-188.
- [11]Litovski, V., Andrejević, M, Zwolinski. M. "Behavioural modelling, simulation, test and diagnosis of MEMS using ANNs", Proc. Intl. Symp. Circuits and Systems, ISCAS'05, Kobe, Japan, 2005, pp. 5182-5185.
- [12]Mrčarica, Ž, Ilić, T., Glozić, D., Litovski, V., and Detter, H., "Mechatronic Simulation Using Alecsis: Anatomy of the Simulator", Proc. of the Eurosim'95, Vienna, Austria, Sept. 1995, pp. 651-656.

# Analysis of Outdoor Emissions from Printed Circuit Board Enclosed in Metallic Box with Aperture

Jugoslav Joković, Nebojša Dončov, Bratislav Milovanović and Tijana Dimitrijević

*Abstract* - In this paper, an outdoor electromagnetic (EM) emissions from a printed circuit board (PCB) in metallic enclosure with aperture are considered. The Transmission Line Matrix (TLM) method is applied to account for the interactions between the PCB and the enclosure by including the basic physical features of the PCB with feeding and terminations realized through TLM wire ports. Numerical results of EM field components inside and outside the enclosure are mutually compared as well as with measurements and effect of the aperture presence on EM emissions from PCB are analysed.

*Keywords* – EM emissions, PCB, Enclosure, Aperture, TLM method.

#### I. INTRODUCTION

The rapid development and utilization of advanced digital techniques for information processing and transmission in modern communication systems have led to a further evolution of the semiconductor technology to nanometre regime. A number of complex components and devices, usually in high-density packaging, can be found in today's communication systems resulting in a very challenging electromagnetic (EM) field environment. Therefore, electromagnetic compatibility (EMC) [1] has become one of the major issues in design of these systems, especially some of their parts such as printed circuit boards (PCBs) and integrated circuits (ICs).

Clock rates that drive PCBs are now in the GHz frequency range in order to increase dramatically processing speed. Therefore, consideration of even a few higher harmonics of clock rates takes design of such circuits well into the microwave regime. PCBs are becoming increasingly more complex and as consequence quantifying their EM presence is more difficult. In the microwave frequency range, PCBs have dimensions of the order of several wavelengths and thus become efficient radiators and receivers of EM energy. In addition, highdensity packaging, widely applied to the PCB design, could cause a significant level of EM interference (EMI) between neighbours PCBs, particularly when they are placed in an enclosed environment. These effects in combination with the driving down of device switching voltage levels are making signal quality/integrity and emission/ susceptibility

Jugoslav Joković, Nebojša Dončov, Bratislav Milovanović and Tijana Dimitrijević are with the University of Niš, Faculty of Electronic Engineering, Department of Telecommunications, Aleksandra Medvedeva 14, 18000 Niš, Serbia, E-mail: {jugoslav.jokovic, nebojsa.doncov, bratislav.milovanovic, tijana.dimitrijevic}@elfak.ni.ac.rs critical EMC issues in next generation high-speed systems.

Differential numerical techniques, such as the finitedifference time-domain (FD-TD) method [2] and the transmission line matrix (TLM) method [3], are common tools for computational analysis of numerous EM and EMC problems. However, a full-wave three dimensional (3D) numerical simulation to accurately reproduce the EM field around a PCB usually requires substantial computing power and simulation run-time. Therefore, one efficient technique based on the equivalent principle [4], providing simplified equivalent dipole models to accurately predict radiated emissions without reference to the exact details of the PCB has been recently proposed [5]. The model has been deduced from experimental near-field scanning and it includes not only the excitation but also physical features of PCB such as its ground plane and dielectric body, both very important in closed environment. However, such model can be very complex and run-time consuming when it is incorporated into conventional calculation algorithms of FD-TD or TLM methods.

For some of the geometrically small but electrically important features (so-called fine features), such as wires, slots and air-vents, few enhancements to the TLM method have been developed [6-8]. These compact models have been implemented either in the form of an additional onedimensional transmission line network running through a tube of regular nodes or in the form of an equivalent lumped element circuit, allowing to account for EM presence of fine features without applying a very fine mesh around them. Compared to the conventional approach, these models yield a dramatic improvement in computer resources required. Similar compact model could be developed for the PCB allowing for an efficient implementation into the TLM algorithm procedure and accurate representation of EM emissions and coupling of the PCB. Developing of such model has assumed that an extensive full-wave analysis has to be conducted in order to fully characterize EM presence of the PCB either in the free space or in an enclosed environment.

In this paper, we consider the basic test PCB placed in a rectangular metallic enclosure as typical closed environment for PCBs. It consists of L-shaped microstrip track on FR4 substrate [5]. In addition, an aperture on top enclosure wall (e.g. used for outgoing or incoming cable penetration from and to PCB) is also taken into account. The impact of radiated emission of this simple PCB structure, with wire feed and terminated probes at its ends, on EM field distribution is investigated. Numerical TLM

results of EM field at resonances outside the enclosure are compared with corresponding results inside enclosure based on simulations and measurements [5]. The EM field patterns inside and outside the enclosure are compared in order to analyse of aperture presence impact on radiated EM emissions from PCB.

### II. MODELLING PROCEDURE

#### A. TLM method

In TLM method, a 3D EM field distribution in a PCB structure in enclosure is modelled by filling the space with a network of transmission lines and exciting a particular field component in the mesh by voltage source placed on the excitation probe. EM properties of a medium in the substrate and enclosure are modelled by using a network of interconnected nodes. A typical node structure is the symmetrical condensed node (SCN), which is shown in Fig. 1. To operate at a higher time-step, a hybrid symmetrical condensed node (HSCN) [3] is used. An efficient computational algorithm of scattering properties, based on enforcing continuity of the electric and magnetic fields and conservation of charge and magnetic flux, is implemented to speed up the simulation process. For accurate modelling of this problem, a finer mesh within the substrate and cells with arbitrary aspect ratio suitable for modelling of particular geometrical features, such as microstrip track, are applied. External boundaries of arbitrary reflection coefficient of enclosures are modelled in TLM by terminating the link lines at the edge of the problem space with an appropriate load.



Fig. 1. Symmetrical condensed node

#### B. Compact wire TLM model

In TLM wire node, wire structures are considered as new elements that increase the capacitance and inductance of the medium in which they are placed. Thus, an appropriate wire network needs to be interposed over the existing TLM network to model the required deficit of electromagnetic parameters of the medium. In order to achieve consistency with the rest of the TLM model, it is most suitable to form wire networks by using TLM link and stub lines (Fig. 2) with characteristic impedances, denoted as  $Z_{wy}$  and  $Z_{wsy}$ , respectively.

An interface between the wire network and the rest of TLM network must be devised to simulate coupling between the EM field and the wire.



Fig. 2. Wire network

In order to model wire elements, wire network segments pass through the centre of the TLM node. In that case, coupling between the field and wire coincides with the scattering event in the node which makes the scattering matrix calculation, for the nodes containing a segment of wire network, more complex. Because of that, an approach proposed in [6], which solves interfacing between arbitrary complex wire network and arbitrary complex TLM nodes without a modification of the scattering procedure, is applied to the modelling of microstrip structures.

The single column of TLM nodes, through which wire conductor passes, can be used to approximately form the fictitious cylinder which represents capacitance and inductance of wire per unit length. Its effective diameter, different for capacitance and inductance, can be expressed as a product of factors empirically obtained by using known characteristics of TLM network and the mean dimensions of the node cross-section in the direction of wire running [6].

Following the experimental approach that using inner conductor of coaxial guide as a probe, numerical characterisation of EM field inside the cavity can be done by introducing wire ports at the interface between wire probes and enclosure walls.

#### **III. NUMERICAL RESULTS**

TLM simulations are carried out to determine the EM emissions from basic PCB structure in form of the test board with a microstrip printed on the dielectric substrate, placed in metallic enclosure with an aperture [5]. The numerical TLM model of EM emissions from this board inside the enclosure are verified with reference results based on equivalent dipole simulations and measurements [5, 9].

The basic test PCB is a 2-mm wide L-shaped microstrip track ( $l_1$ =40mm,  $l_2$ =20mm) on one side of a PCB<sub>x</sub>× PCB<sub>y</sub>× PCB<sub>z</sub>=(80×50×1.5)mm<sup>3</sup> board made from FR4 substrate with relative permittivity  $\varepsilon_r$  =4.5. The geometry of the board is shown in Fig. 3. The test board PCB is mounted on the bottom of an enclosure in the form of rectangular metallic box with dimensions  $a \times b \times c$ =(284×204×75)mm<sup>3</sup>. The PCB is powered by external RF signals via probe (with radius of 0.5mm) placed at one end of microstrip track (point A).



Fig. 3. Geometry of basic test PCB

It this structure, the enclosure is modeled through setting reflection coefficients of metallic walls, while feed and terminated probes are described by using the compact wire model applying generator and loads in TLM wire ports at the ends of microstrip track. Also, the aperture is incorporated into the TLM model together with the enclosure, in order to simulate the real enclosed environment problem. According to the experimental setup, an aperture with dimensions  $a_1 \times b_1 = (60 \times 10) \text{mm}^2$ , placed on the top wall of the enclosure above the PCB, is modeled (Fig 4).



Fig. 4. TLM model of basic test PCB in enclosure with aperture

When a PCB is inside an enclosure, it is of particular interest to investigate the behavior near the resonant frequencies of the enclosure. Since the PCB causes difference in frequency and also changes peak field magnitude, the modeling of PCB elements is essential in enclosed environment simulations. Therefore, numerical results of resonant frequencies in the modeled closed environment structure with the PCB are analyzed. Also, enclosure should be taken into account when outdoor emission EMC compliance test of PCB is conducted. Fig 5. presents the TLM simulation results of resonant frequencies obtained from vertical electric field sampled above the PCB, at points z=35mm (inside enclosure) and z=90mm (outside enclosure), corresponding to center of aperture in xy plane (x=210mm, y=108mm). Presented results of resonances, presented in Table I, are in a good agreement with results based on simulations and measurements [5].



Fig. 5. TLM numerical results of vertical component of electrical field from basic test PCB in enclosure with aperture

 TABLE I

 COMPARISON OF MEASURED AND TLM RESULTS

| PCB in      | PCB in Measured | TLM simulation |         |  |
|-------------|-----------------|----------------|---------|--|
| enclosure   | [5]             | inside         | outside |  |
| Resonant    | 900             | 903            | 905     |  |
| frequencies | 1290            | 1285           | 1288    |  |
| (MHz)       | 1740            | 1749           | 1789    |  |

Fig. 6 shows the simulation results of electric field component at resonant frequencies, in horizontal planes, sampled at z=35 mm and z=90 mm above the PCB representing EM emissions inside and outside enclosure, respectively. The patterns of  $E_z$  given by the TLM simulation at resonant frequencies (Fig.6.a), illustrating EM field distribution of an enclosure due to the physical presence of a PCB and aperture, have a very good agreement with corresponding results based on equivalent dipole simulations and measurements [5]. Since the results of  $E_z$  field component at resonant frequencies outside the enclosure (Fig.6.b), are sampled at plane 15 mm above the aperture, TLM mesh is extended to the space above top wall of the enclosure, where the aperture is placed.

Generaly, the patterns representing EM emissions outside the enclosure are different from corresponding results representing emissions inside enclosure and dominantly determined by aperture position. It can be seen that emissions outside of enclosure are much smaller compared with levels at corresponding resonances inside the enclosure. Obtained numerical results illustrating EM emissions confirm that the impact of the aperture is not critical for resonances when its dimension is much smaller than the volume of the enclosure so that do not disturb EM field distribution inside the enclosure. However, aperture determines the level of EM field radiated outside the enclosure. In the case of the third resonance (at 1749 MHz) having peak corresponding to the aperture position,  $x=(180\div240)$ mm, the level of EM field outside enclosure is more increased than in case first and second resonances.



Fig. 6. Patterns of  $E_z$  at resonances, given by the TLM simulation of PCB in enclosure with aperture: a) inside, b) outside

#### IV. CONCLUSION

Starting that one of the main interests in EMC tests is the intensities and distributions of the radiated fields from equipment under test (EUT), results are presented here of the EM emissions from basic PCB structure placed in enclosure with aperture. A method applied to determine radiated emissions from a PCB using a model based on TLM modelling of a test board in an enclosure with an aperture, to account for the interactions between the physical presence of the PCB and the enclosure.

The values of resonances obtained using TLM simulation are compared with reference values found experimentally by observing the field magnitude inside the enclosure. The patterns of EM emissions at enclosure resonances inside and outside the enclosure with aperture are compared and impact of aperture on EM emissions are analyzed.

The simulation results of test boards show that the inclusion of basic features, such as the microstrip track and substrate, in addition to the wires elements for feeding and terminations, permit an accurate prediction of emitted fields to be made in enclosure that have interactions with the PCBs inside. Generally, it is demonstrated here that the TLM method have the potential to characterize emissions from PCB structures in realistic environments such as an enclosure with an aperture and making it possible to perform system EMC studies.

#### ACKNOWLEDGEMENT

This work was supported by Ministry of Education, Science and Technology development of Republic of Serbia, under the project III-44009.

#### References

- Christopoulos, C., "Principles and Techniques of Electromagnetic Compatibility", 2<sup>nd</sup> edition, CRC Press, Boca Raton, FL, 2007.
- [2] Kunz, K. S., Luebbers, R. J., "The Finite Difference Time Domain Method for Electromagnetics", CRC Press, Boca Raton, FL, 1993.
- [3] Christopoulos, C., "The Transmission-Line Modelling (TLM) Method", IEEE Press in association with Oxford University Press, Piscataway, NJ, 1995.
- [4] Balanis, C. A., "Antenna Theory Analysis and Design", John Wiley and Sons, New Your, 1997.
- [5] Tong, X., Thomas, D.W.P., Nothofer, A., Sewell, P., Christopoulos, C., "Modeling Electromagnetic Emission From Printed Circuit Boards in Closed Environment Using Equivalent Dipoles", IEEE Transactions on Electromagnetic Compatibility, Vol. 52, No. 2, May 2010, pp. 462-470.
- [6] Wlodarczyk, A. J., Trenkic, V., Scaramuzza, R., Christopoulos, C., "A Fully Integrated Multiconductor Model For TLM", IEEE Transactions on Microwave Theory and Techniques, Vol. 46, No. 12, December 1998, pp. 2431-2437.
- [7] Trenkic, V., Scaramuzza, R., "Modelling of Arbitrary Slot Structures Using Transmission Line Matrix (TLM) Method", International Symposium on Electromagnetic Compatibility, Zurich, Switzerland, 2001, pp. 393-396.
- [8] Dončov, N., Wlodarczyk, A. J., Scaramuzza, R., Trenkic, V., "Compact TLM Model of Air-vents", Electronics Letters, Vol. 38, No. 16, 2002, pp. 887-888.
- [9] Bratislav Milovanović, Jugoslav Joković, Nebojša Dončov, "Modelling of Printed Circuit Boards in Closed Environment Using TLM Method", Proc. of the SSSS 2012 Conference, Niš, Serbia, February 2012, pp. 93-96.

# Analysis of Electronic Structure of Carbon Nanotubes

Mariya Spasova, George Angelov, Anna Andonova, Tihomir Takov, and Marin Hristov

*Abstract* – Single-walled carbon nanotube provide huge potential for the carbon-based nanodevices and circuit integration. This paper focuses upon the study of single-walled carbon nanotubes with zig zag (3,0) chirality, armchair (3,3) chirality, and Chiral (3,2) chirality. Bloch States, transmission spectrum, density of States, and band structure of Carbon nanotubes are simulated in Virtual Nanolab simulator and validated against experimental measurements of the CNT structure.

Keywords - CNTFET, chirality, band structure, graphene.

#### I. INTRODUCTION

Single-Walled Carbon Nanotubes (SWNTs) are members of the carbon family of fullerenes. Their unique structure and nanometer sizes which are promising for nanoelectronic and nanomechanical applications [1]. The geometry of the Carbon Nanotubes is that of a hollow cylinder. The carbon atoms are arranged in a honeycomb crystalline lattice. The atomic structure of a single-walled carbon nanotube is described by its chiral indices (n, m).



Fig. 1. (a) The lattice structure of graphene, a honeycomb lattice of carbon atoms. (b) The energy of the conducting states as function of the electron wavevector k. (c), (d) Graphene sheets rolled into tubes. This quantizes the allowed k's around the circumferential direction, resulting in 1D slices through the 2D band structure in (b). Depending on the way the tube is rolled up, the result can be either a metal (c) or a semiconductor (d) [2]. Graphene consists of a 2D honeycomb structure of sp<sup>2</sup>

Mariya Spasova, George Angelov, Anna Andonova and Marin Hristov are with the Department of Microelectronics, Faculty of Electronic Engineering and Technology, Technical University of Sofia, 8 Kliment Ohridsky blvd.,

e-mail: {mls, angelov, ava, takov, mhristov}@ecad.tu-sofia.bg.

bonded carbon atoms as shown in Fig. 1(a). Its band structure has conducting states at Ef, but only at specific points along certain directions in momentum space at the corners of the Brillouin zone as shown in Fig. 1(b). The momentum of the electrons moving around the circumference of the tube is quantized. This quantization results in tubes that are either one-dimensional metals or semiconductors, depending on how the allowed momentum states compare to the preferred directions for conduction. Choosing the tube axis to point in one of the metallic directions results in a tube whose dispersion is a slice through the center of a cone (Fig. 1(c)). The tube acts as a 1D metal with a Fermi velocity  $v_f = 8 \times 10^5$  m/s comparable to typical metals. If the axis is chosen differently, the allowed ks take a different conic section, such as the one shown in Fig 1(d) [2].

In this paper are described single-walled carbon nanotubes with different chirality. In paragraph II are calculated a several possible parameters of the tubes with different chirality. Section III describes the results of analysis of Carbon nanotubes with zig zag (3,0) chirality, armchaire (3,3) chirality, and Chiral (3,2) chirality. Bloch States, transmission spectrum, density of States, and band structure of Carbon nanotubes are simulated in Virtual Nanolab simulator.

#### II. THEORETICAL CALCULATION OF SWNTS

#### A. Structural description of carbon nanotubes

We are using the solid state physics convention to describe the graphene lattice structure, where the basis

vectors of the graphene net  $a_1$  and  $a_2$  ( $a_1 = a_2 = a_o = 0.246$  nm) are separated with an inter-angle of 60°, as shown schematically in Fig. 2 in radial projection. The diameter, d, of the carbon nanotube is

$$d = \frac{a_0}{\pi} \sqrt{n^2 + nm + m^2} \tag{1}$$

and the helicity,  $\alpha$ , defined as the angle between the perimeter vector,  $\vec{A} = (n, m)$ , and the basis vector,  $\vec{a_1}$ , illustrated in Fig. 2, is

$$\alpha = \tan^{-1}(\frac{\sqrt{3m}}{2n+m}) \tag{2}$$

The chiral indices  $(n_c, m_c)$ , perpendicular to the chiral

vector,  $\dot{A}$ , can be calculated by orthogonallity relationship between the tubule perimeter and the tubule axis

$$\frac{m_c}{n_c} = -\frac{2n+m}{n+2m} \tag{3}$$

The axial periodicity, c, of Carbon nanotube (n, m) can be obtained

$$c = a_0 \sqrt{n_c^2 + m_c^2 + n_c m_c}$$
 (4)



Fig. 2. Schematic structure of graphene with basis vectors  $\overrightarrow{a_1}$  and  $\overrightarrow{a_2}$ . The shadowed rectangle is the radial projection of carbon nanotube (7,1) with perimeter  $\overrightarrow{A}$  and helical angle  $\alpha$  [3].

The atomic positions of a single-walled carbon nanotube can be conveniently expressed by the Cartesian coordinates  $(x_{j}, z_{j})$  in the radial projection, where the nanotube is projected onto a rectangle with sides *A* and *c* as described above [3].

#### B. Calculation results

In this paragraph are calculated a several possible parameters of the tubes with different chirality, as listed in Table 1. The table lists the some types of chirality of zig zag, armchair, and chiral tubes with fixed bond length  $_0 = 2.46$  [Å].

TABLE I

List of chiral indices (n,m), diameter (d), and helicity  $(\alpha)$  of carbon nanotubes

|     | Chiralily               | <sub>1</sub> [Å] | 2 [Å]  | d [Å] | α [°] | <sub>0</sub> [Å] |
|-----|-------------------------|------------------|--------|-------|-------|------------------|
|     | ( <i>n</i> , <i>m</i> ) |                  |        |       |       |                  |
|     | (3, 0)                  | 1.6735           | 1.6735 | 2.35  | 0     | 2.46             |
| zig | (6, 0)                  | 1.6735           | 1.6735 | 4.70  | 0     | 2.46             |
| zig | (9, 0)                  | 1.6735           | 1.6735 | 7.05  | 0     | 2.46             |
|     | (18, 0)                 | 1.6735           | 1.6735 | 14.10 | 0     | 2.46             |

|       | (3, 3)   | 1.6735 | 1.6735 | 7.046 | 30     | 2.46 |
|-------|----------|--------|--------|-------|--------|------|
| chair | (6, 6)   | 1.6735 | 1.6735 | 8.141 | 30     | 2.46 |
| JUINC | (9, 9)   | 1.6735 | 1.6735 | 12.21 | 30     | 2.46 |
|       | (10, 10) | 1.6735 | 1.6735 | 13.56 | 30     | 2.46 |
| I     | (3, 2)   | 1.6735 | 1.6735 | 3.414 | 23.413 | 2.46 |
| hira  | (6, 4)   | 1.6735 | 1.6735 | 6.829 | 23.413 | 2.46 |
| 0     | (9, 8)   | 1.6735 | 1.6735 | 11.54 | 28.054 | 2.46 |

#### **III. SIMULATIONS**

In this paragraph are simulated Carbon nanotubes with chirality (3,0), (3,3), (3,2) in Virtual Nanolab simulator. In Fig. 3 are shown the bulk configurations of carbon nanotubes with the mentioned chirality.



Fig. 3. Bulk configuration - a) (3,0), b) (3,3), c) (3,2).

In Fig. 4, 5, and 6 Bloch States of carbon nanotubes with the above mentioned chirality are shown. Bloch states can be used to investigate the symmetry of certain bands. This can be related to the transport properties of Carbon nanotubes. In Fig. 5 the full symmetry is shown between "up" and "down" spin states. In Fig. 4 the tube with chirality (3, 3) shows lower bond strength than Carbon nanotubes with chirality (3, 0). In Fig. 6 the tubes are Chiral type and they have full symmetry.



Fig. 4. Bloch States of Carbon nanotubes with chirality (3,3) - a) ,,up" spin orientation and b) ,,down" spin orientation



Fig. 5. Bloch States of Carbon nanotubes with chirality (3,0) –
a) "up" spin orientation and b) "down" spin orientation



Fig. 6. Bloch States of Carbon nanotubes with chirality (3,2) - a) ,,up" spin orientation and b) ,,down" spin orientation

In Fig. 7, 8, and 9 Density of States of carbon nanotubes with the same (3,0), (3,3), (3,2) chirality are shown. Density of States of carbon nanotubes shows the energy gap around the Fermi level. This can be seen in Fig. 7 and 8. In Fig. 9 we do no observe energy gaps because the tube with chirality (3,2) is of Chiral type.



Fig. 7. Density of States of Carbon nanotube with chirality (3,0)



Fig. 8. Density of States of Carbon nanotube with chirality (3,3)



Fig. 9. Density of States of Carbon nanotube with chirality (3,2)



Fig. 10. Transmission spectrum of Carbon nanotube with chirality (3,0)



Fig. 11. Transmission spectrum of Carbon nanotube with



Fig. 12. Transmission spectrum of Carbon nanotube with chirality (3,2)



Fig. 13. Band Structure of Carbon nanotube with chirality (3,0)



Fig. 14. Band Structure of Carbon nanotube with chirality (3,3)



Fig. 15. Band Structure of Carbon nanotube with chirality (3,2)

In Fig. 10, 11, and 12 Transmission spectrum of carbon nanotubes with (3,0), (3,3), (3,2) chirality are shown. From the Transmission spectrum of the carbon nanotube we can determine the energy of bonds when atoms are grown along the nanotube's length. We observed that for carbon nanotube with chirality (3,0) when adding carbon atoms to the nanotube the bond energy between the newly added atom and each of the existing adjacent atoms is equal to one another. The carbon nanotube with chirality (3,0) has better energy transmission spectrum.

In Fig. 13, 14, and 15 the Band Structure of carbon nanotubes with the above mentioned chirality is shown. For nanotubes with chirality (3,2) and (3,0) DFT (Density Functional Theory) with LDA (Local Density Approximation) is used for calculation of the Band Structure. For nanotubes with chirality (3,3) the Hückel calculation method is used. The *k*-points are set to 5x5x5 for  $n_{ar}$   $n_{br}$   $n_{co}$  respectively.

#### IV. CONCLUSION

We performed analysis of Carbon nanotubes with zig zag (3,0) chirality, armchair (3,3) chirality, and Chiral (3,2) chirality. The analysis showed that tubes with (3,0) chirality have better energy bond, tubes with (3,2) chirality have more energy levels into their Brillouin zone. Transmission spectrum of carbon nanotubes with (3,0) is best among the other chiralities that we have examined.

#### ACKNOWLEDGEMENT

This work was supported by National Ministry of Science and Education of Bulgaria under Contract DFNI-I01/9-3

#### REFERENCES

- [1] Nizam, R., Rizvi, M., Azam, A., "Calculating Electronic Structure of Different Carbon nanotubes and its Affect on Band Gap", International Journal of Science and Technologies, Vol. 1, No. 4, Oct., 2011, pp. 153-162.
- [2] McEuen, P., Fuhrer, M., Park, H., "Single-Walled Carbon Nanotube Electronics", IEEE Transactions on nanotechnology, Vol. 1, NO. 1, March, 2002
- [3]Qin, L., " Determination of the chiral indeces (n, m) of carbon nanotubes by electron diffraction", IEEE Transactions on nanotechnology, Rep. Prog. Phys. 69, 2761(2006)

# RFIC Passive Component Design and Simulation in Python

### Dušan Grujić, Pavle Jovanović, Dušan Krčum, and Milan Savić

Abstract – In this paper we present a Python based RFIC component layout generator - Passive Component Lab and a linear circuit simulator nicSim. These tools can be used for design and optimization of RFIC passive components and circuits. Capability to simulate S parameters in tabulated or State Space representation allows the simulation of linear amplifiers as well, by using transistor S parameters in given biasing conditions. Implementation in Python offers great flexibility, while the underlying speed and capacity of sparse matrix solvers available in a standard Python module SciPy, implemented in C, allows the simulation of real world problems.

Keywords - RFIC, Linear simulator, S parameters, Python.

#### I. INTRODUCTION

Circuit simulators have been a backbone of IC design industry since the introduction of SPICE [1] forty years ago. Since then many open source and commercial derivatives have emerged, clearly revealing the academic interest and commercial potential. General purpose simulators are designed for simulation speed and capacity, while it is desirable to be flexible and easily extensible. To achieve the speed and capacity, the simulators are usually implemented in C or C++, while the flexibility and extensibility is ensured by the use of complex data structures or object-oriented paradigm to abstract the device models from simulator engine.

In this paper, we propose a different approach, focused on flexibility and extensibility. Speed and capacity is of secondary importance, since the simulator is designed for a specific purpose, and is intended to be light-weight. However, this does not mean that the simulator is not usable in real-world problems.

Python is an interpreted programming language which has lately been embraced by the scientific community. Acceptance of Python can be attributed to several factors: ease of use, large base of scientific modules for numerical and symbolic calculation, quality data plotting and presentation, and last but not least, open source license.

We have designed two software tools in Python – Passive Component Lab for automatic layout generation, and nicSim for linear circuit simulation. Both tools can be used standalone, but they show full potential when used in conjunction with Python based optimizers, as shown in the

Pavle Jovanović and Dušan Krčum are with the School of Electrical Engineering, University of Belgrade, Bul. kralja Aleksandra 73, 11000 Belgrade, Serbia.

Dušan Grujić, Pavle Jovanović, Dušan Krčum, and Milan Savić are with NovelIC, Omladinskih brigada 86p, 11070 Novi Beograd, Serbia. E-mail: {first.last name}@novelic.com proposed Design Flow.

#### II. PASSIVE COMPONENT LAB

Passive Component Lab is a collection of Python classes used for automatic generation of integrated passive components, such as inductors and transformers. All components are fully parametrized, and the output can be exported to CAD tools or EM simulators.

Technology information is contained in a separate technology class, allowing the reuse of the same code in all generator classes. The information is read from a textual file containing the information about grid, available layers, connectivity information and basic design rules. Additional information, such as layer conductivity and integrated circuit BEOL cross section, i.e. dielectric layers and their conductivity, can be included for automatic 3D model generation. Simple example of technology file is given below.

```
grid = 0.01
```

```
layer TM2 metal
    GDSIINum = 134
    GDSIIType = 0
endlayer
layer TM1 metal
    GDSIINum = 126
    GDSIIType = 0
endlayer
layer TopVia2 via
    GDSIINum = 133
    GDSIIType = 0
    topmet = TM2
    botmet = TM1
    viaEnc = 0.5
    viaSize = 0.9
    viaSpace = 1.06
```

endlayer

The information contained in example technology file is sufficient for generating DRC correct inductors and transformers in top two metals. The example Python code for generating a transformer balun with 4 primary windings and 3 secondary windings is given below.

```
tech = Technology("technology.txt")
bal=balun4x3(tech)
r=300 # Outer radius
w=8 # Winding width
s=3 # Winding spacing
signalLayer = "TM2"
underPassLayer = "TM1"
bal.emVias=True # Merge vias
bal.setupGeometry( r, w, s, signalLayer,
underPassLayer, "octagon")
bal.genGDSII('bal4x3_w8_s3.gds')
```

Example Python code reads the technology information, generates the balun geometry and exports it to GDSII file, which can be imported into any CAD tool or EM simulator. Connectivity information is extracted from technology file, and the vias connecting metals TM1 and TM2 are inserted automatically at appropriate locations. Property emVias is set to merge adjacent vias and simplify the model for EM simulation. The generated transformer balun is shown in Fig. 1.

The code similar to the provided one have been successfully used in the development of high performance UWB CMOS transceiver. The transceiver contained a multitude of inductors and transformer baluns, which would be impossible to draw manually. Performance optimization was greatly simplified by automatic layout generation.



Fig. 1. Transformer balun generated by example code

Automatic layout generator can generate inductors with arbitrary number of windings, with step of 1/4 of a winding, and with arbitrary geometry. Both square and octagonal inductors are supported, in order to have a degree of freedom to choose between maximum inductance for a given area or improved quality factor. Various transformer balun geometries are supported, with transformation ratio from 1:1 to 1:4.

New geometries can be easily added, since the common

geometries are sub-classed and can be easily reused. For example, filling a given area with vias is implemented as a method of a base class, which is inherited by all layout generators. The user has only to specify the coordinates of opposing edges of a rectangle and a via layer; the via drawing method reads the design rules and layer mapping from a technology to produce a DRC correct layout in terms of via size, spacing, and enclosure. Such degree of flexibility allows the creation of very complex parametrized geometries in a matter of minutes.

Additional features, such as predictive models for passive structures are under development. Predictive models for passive structures [2] provide a circuit model for a given passive structure, and can be used for quick performance evaluation and optimization. They can be easily added to layout generator, since the technology file and geometry specification are already present. Being able to generate both predictive circuit models and physical geometry of passive components will make the Passive Component Lab a very powerful tool for every RFIC designer.

#### III. NICSIM

Linear circuit simulator nicSim is fully implemented in Python, and is intended to be self-contained, with minimum dependency on external libraries. Reducing the dependencies on external libraries and modules makes it easy to install and use, and also light-weight in terms of memory and disk space requirements. The only external dependencies are SciPy [3] and NumPy [4], which are commonly pre-installed on many Linux based systems.

The simulator itself is minimalistic, having only the features that are required for the purpose of simulation and optimization of circuits containing integrated passive components. It uses Modified Nodal Analysis (MNA) [5] formulation for solving the electrical circuit.

Sparse matrix solvers from SciPy are used for solving the system of the form Ax=b. Direct matrix inversion is not used in solving the system. Instead, a dedicated function for solving the sparse system of linear equations is used. This way the ill-conditioned system can be efficiently handled by element pivoting implemented in a dedicated solver function.

Underlying matrix solvers in SciPy are written in C, so the solver performance is not affected by interpreter nature of Python. This way best of both worlds is utilized: flexibility and rapid development of Python and the sheer speed of C.

The simulator has no frontend netlist parser, since the circuit is built directly from Python. Circuit components, such as resistors, capacitors, inductors, independent and dependent voltage and current sources, are implemented as Python classes. They can be instantiated as any Python object and added to a circuit with a simple call to appropriate method. The circuit itself and simulations are also Python classes, so there is no limit in number of

circuits or simulations, except for the system memory. Currently supported simulation types are DC, AC and S parameter simulation.

Example Python code for S parameter simulation of 3 dB matched attenuator is given below.

```
import nicSim as sim
import numpy as numpy
cir=sim.circuit()
res = sim.resistor
r1 = res('R1', 'N1', 'N2', 17.6)
r2 = res('R2', 'N1', '0', 292.4)
r3 = res('R3', 'N2', '0', 292.4)
p1 = sim.port('P1', 'N1', '0')
p2 = sim.port('P2', 'N2', '0')
cir.addElement([r1, r2, r3, p1, p2])
spsim=sim.sp_analysis(['P1','P2'])
f_list=numpy.arange(1e6,100e6,1e6)
spsim.simulate(cir, f_list)
```

Simulation results can be easily plotted in publication quality with Matplotlib [6], by using the following code.

```
from pylab import *
sl1 = spsim.sParam[:,0,0]
sl1db = 20*log10(abs(sl1))
s21 = spsim.sParam[:,1,0]
s21db = 20*log10(abs(s21))
f = spsim.f
plot(f,sl1db)
plot(f,s21db)
```

All components in nicSim are implemented as Python classes. The component class contains the node names, parameters and other data, such as frequency response, and methods for parameter evaluation and matrix stamping. This approach is similar to the one used in SPICE simulator, where the simulator provides the interfaces for matrix stamping and does not go into details of device implementation. Adding new devices to nicSim is easy, since the simulator does not need to be changed – only the new component class with appropriate methods for initialization and matrix stamping has to be designed.

Python is not a strongly typed programming language, so variable type is dynamic, and can change during program execution. This opens an opportunity for exciting and diverse features, which would be very difficult to implement in strongly typed languages, such as C.

One of the most obvious use of dynamic typing is the use of expressions in component parameters. Early SPICE implementations allowed only the use of numerical constants for component parameter values. Newer SPICE versions and commercial simulators allow the use of variables and limited set of expressions for component parameters. Implementation of such feature is by no means simple and easy, since it requires the design of expression parser and evaluator. Python can handle variables and expressions in component parameters in a very simple manner. The type of passed component parameter can be a numerical constant or a string expression. Python built-in *evaluate* function will evaluate the given expression in a scope of defined variables. Evaluating the expression with variables is nothing new. New is the possibility to pass a function reference as a component parameter value. Provided function will be called each time the component parameter is evaluated, and its return value will be used. This opens a possibility to have an arbitrarily complex function, table look-up or even database or file based component value.

Another example of Python dynamic typing use in component values is symbolic circuit solving. Symbolic circuit solvers have been designed in variety of ways [7], but Python provides a natural way of implementation. To convert a standard linear circuit simulator into symbolic, one would only need to change the matrix stamping and solver routine. Component matrix stamping routine would have to be changed to stamp the string expression instead of numeric value. The solver would have to be replaced by symbolic solver, which are readily available for Python. This approach was used in Ahkab circuit simulator [8], which can solve the circuit both numerically and in a symbolic fashion.

Besides the linear, frequency independent components, nicSim supports n port S parameter blocks. This feature is important since the simulator is intended for RF passive network design and optimization, and it can include measured or simulated component S parameters. Additionally, S parameter block allows the simulation of linear amplifiers, where the transistors are replaced by S parameters. This way the whole amplifier can be simulated and optimized.

S parameters are usually provided in Touchstone file format, which contains the S matrix elements at a given number of frequencies. Availability of S parameters at discrete frequency points requires the use of interpolation techniques. Another commonly used way of representing S parameters is State Space representation:

$$\begin{aligned} E\dot{x} &= Ax + Bu\\ y &= Cx + Du \end{aligned} \tag{1}$$

where x represents the state vector, u is the input vector, and the y is the output vector. Transfer function in frequency domain, which is the S parameter is then given by:

$$S_{ii}(s) = C(sE - A)^{-1}B + D$$
 (2)

Formulations given in (1) and (2) are commonly used by EM simulators to perform the adaptive frequency sweep, for example Agilent Momentum. As a result, State Space representation of S parameters is available and response can be calculated at any frequency within the valid frequency range.

#### IV. DESIGN FLOW

Design flow using Passive Component Lab and nicSim is shown in Fig. 2. Layout generator and circuit simulator, described in this paper, can be coupled with a user supplied



Fig. 2. Design Flow with user supplied optimizer

Python-based optimizer to fit the given circuit model parameters to simulated or measured S parameters of passive component, such as integrated inductor or transformer. Since both Passive Component Lab and nicSim are written in Python, user supplied optimizer can read the simulation data directly from data structures, which eliminates the need for data translation. This is a major advantage, since there are many Python based optimizers to choose from.

Example of circuit parameter optimization to fit the measurements are given in Fig. 3. In this case, the passive component is an inductor, and the assumed circuit model is single- $\pi$  inductor model [9]. Optimized circuit model's Q factor is very close to measured one up to self-resonant frequency.

#### V. CONCLUSION

Despite being an interpreted programming language, Python can be used for specialized circuit simulators. Penalty in performance can in some cases be of secondary importance, when flexibility and extensibility are of interest.



Fig. 3. Optimized circuit model Q factor vs measurements

#### References

- Nagel, L. W, and Pederson, D. O., "SPICE (Simulation Program with Integrated Circuit Emphasis)", Memorandum No. ERL-M382, University of California, Berkeley, Apr. 1973
- [2] Gao, W., and Yu, Z., "Scalable compact circuit model and synthesis for RF CMOS spiral inductors", IEEE Transactions on Microwave Theory and Techniques, Vol. 54, No. 3, March 2006., pp 1055-1064.
- [3] SciPy, available at <u>http://www.scipy.org/</u>
- [4] NumPy, available at <a href="http://www.numpy.org/">http://www.numpy.org/</a>
- [5] Litovski, V., Zwolinski, M., "VLSI Circuit Simulation and Optimization", Chapman and Hall, London, 1997.
- [6] Matplotlib, available at http://matplotlib.org/
- [7] Đorđević, S., Petković, P., "A Hierarchical Approach to Large Circuit Symbolic Simulation", Microelectronics Reliability, 41, (2001), pp. 2941-2049
- [8] Ahkab, available at http://ahkab.github.io/ahkab/
- [9] Cao, Y. et al, "Frequency-independent equivalentcircuit model for on-chip spiral inductors", IEEE Journal of Solid-State Circuits, Vol. 38, No. 3, March 2003, pp. 419-426.

# The Influence of Interface and Semiconductor Bulk Traps Generated Under HEFS on MOSFET's Electrical Characteristics

Sanja Aleksić, Danijela Pantić, and Dragan Pantić

Abstract - In this paper, the impact of defects (donor and acceptor traps) which are generated at the  $Si/SiO_2$  interface and in semiconductor bulk, when the gate oxide is exposed to a high electric field stress (HEFS), on n-channel MOS and VDMOS transistors electrical characteristics is analysed and simulated. Taking the advantage of simulation, it was shown how and why the generated traps affect on n-channel MOS and n-channel VDMOS electrical characteristics.<sup>1</sup>

Keywords - TCAD, traps, HEFS, interface, bulk, MOS.

#### I. INTRODUCTION

The stability and reliability of MOSFET's electrical characteristics is one of the most important requirements that are requested in the process of device and circuit design. It is known that under the influence of different effects, such as irradiation, high temperature (NBTI) or high electric field stress (HEFS), the neutral or charged defects (traps) are generated at the Si/SiO<sub>2</sub> interface, as well as in the semiconductor and gate oxide bulk [1-3]. The formed traps may cause the temporal degradation of the electrical characteristics of semiconductor device, which finally affects on the reliable operation of the electronic circuit and device, where the MOSFETs are integrated.

Over the last few decades a number of the physical models for the instability explanations have been proposed [4,5]. The majority of these models have no ability to analyse influence of the all generated traps, considering that these very complex processes are still not well understood, since it is necessary to take into account the impact of a large number of parameters. Using the possibilities provided by Silvaco TCAD simulation tools, in which the advanced physical models are incorporated [6,7], the ability to separate analyse the influence of different parameters, models and mechanisms, on the device electrical characteristics is offered. The effects of interface and semiconductor bulk traps generated under the HEFS on the electrical characteristics of n-channel MOS and VDMOS transistors are investigated in this paper.

#### II. NUMERICAL MODEL

The presence of charged defects or traps at the  $Si/SiO_2$ interface, or in oxide and semiconductor bulk has a significant impact on the device electrical characteristics. These traps are changing the density of space charge and the potential distribution in the device structure and also have the influence on the recombination statistics and carriers mobility. The fact is that the amount of bulk and interface charged traps increases significantly when the devices are exposed to high electric field or radiation (HEFS), and in these cases an accurate simulation of the electrical characteristics of semiconductor devices requires to take into consideration the influence of space charge that comes from stress induced charge traps.

There are three different mechanisms which add space charge directly into the right hand side of Poisson's equation in addition to the ionized donor and acceptor impurities, and these are interface fixed charge, interface trap and bulk trap states. Interface fixed charge is controlled by the interface boundary condition, while the interface and bulk charged traps,  $Q_{IT}$  and  $Q_{IB}$  are added directly into the Poisson's equation:

$$div(\varepsilon \nabla \varphi) = q(n - p - N_D^+ + N_A^-) - (Q_{IT} + Q_{BT})$$
(1)

Associated energy of interface and semiconductor traps lies in forbidden gap and exchange charge through the emission or captured electrons with conduction and valence band. The net charge  $Q_{IT}$  that comes from the presence on

ionized donor-like  $(N_{DT}^+)$  and acceptor-like  $(N_{AT}^-)$  traps at Si/SiO2 interface is defined as:

$$Q_{IT} = q(N_{DT}^+ - N_{AT}^-) = Q_{DT}^+ - Q_{AT}^-)$$
(2)

In the case when the associated energy of donor-like trap (DT) lies in forbidden gap near the bottom of conduction band it releases an electron and becomes positive charged. The increase of positive charge  $(+Q_{IT})$  at the Si/SiO<sub>2</sub> interface reduces the threshold voltage  $V_{TH}$  of n-channel MOSFET (Fig. 1). Contrary, acceptor-like trap (AT) is ionized (negatively charged) when its energy level lies near the top of valence band. In that case, AT is filled with an electron, and the presence of  $-Q_{IT}$  at the interface would cause an increase in the threshold voltage of n-channel MOSFET (Fig. 1). The changes of charge at the Si/SiO<sub>2</sub> interface also affects on other electrical characteristics such as: leakage current  $I_L$ , saturation current  $I_{SAT}$ , etc.

<sup>&</sup>lt;sup>1</sup>Sanja Aleksić and Dragan Pantić are with the Department of Microelectronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia, E-mail: {sanja.aleksic, <u>dragan.pantic@elfak.ni.ac.yu</u> }.

Danijela Pantić is with High School of Electrical Engineering "Nikola Tesla", A. Medvedeva 20, 18000 Niš, Serbia, E-mail: danijela@etstesla.ni.ac.rs.



Fig. 1. The schematic representation of DT and AT ionization processes and the generation of the charge at  $Si/SiO_2$  interface.

Traps generated under the HEFS in the semiconductor bulk influence on the electrical characteristics of MOSFET in a slightly different way. DT and AT ionization processes in the bulk of semiconductor are going on in a similar way and under the same conditions, but in this case the influence of electron concentration in the semiconductor bulk is more important than the formed positive or negative charge (Fig. 2). When the associated energy of DT in forbidden gap changes from E<sub>V</sub> to E<sub>C</sub>, the probability of its ionization increases, it releases an electron and becomes positive charged. The increase of the electron concentration in semiconductor bulk leads to the reduction of threshold voltage  $V_{TH}$ , while at the same time the saturation current ISAT of the n-channel MOSFET increases. In the case of AT, the ionization probability increases when their associated energy changes from E<sub>C</sub> to E<sub>V</sub>. AT in semiconductor bulk captures an electron and becomes neutral. This recombination process significantly reduces the saturation current  $I_{SAT}$ , which is particularly important in n-channel VDMOSFET, due to the fact that its current after the channel, flows vertically through the epitaxial layer and Si substrate to drain contact.



Fig. 2. The schematic representation of DT and AT ionization and generation-recombination processes in semiconductor bulk.

#### **III. SIMULATION RESULTS**

In this section the impacts of DT and AT traps at Si/SiO<sub>2</sub> interface and in semiconductor bulk on the electrical characteristics of n-channel MOS transistor and n-channel VDMOS power transistor are presented. The simulations have been carried out by using the process simulator ATHENA [8] and the device simulator ATLAS [9], which are the integral part of Silvaco TCAD software package.

#### A. n-channel MOS transistor – interface traps

The impact of DT and AT generated at the Si/SiO<sub>2</sub> interface generated under HEFS is analysed on typical nchannel MOS transistor fabricated in standard 0.35µm CMOS technology. The gate oxide thickness is  $d_{OX}$ =10nm. The influences of interface DT and AT on  $V_{TH}$ ,  $I_L = I_D(V_{GS} = 0.1V)$  and  $I_{SAT} = I_D(V_{GS} = 5.V)$ , obtained by analysing the simulation results are summarized in Tab. I.

TABLE I: The influence of interface DT and AT on the electrical characteristics of n-channel MOS transistor.

|                                                                              | V <sub>TH</sub> | IL           | I <sub>SAT</sub> |
|------------------------------------------------------------------------------|-----------------|--------------|------------------|
| $\mathbf{N}_{\mathbf{DT}}$ > $10^{14}$ cm <sup>-3</sup> $\uparrow$           | $\downarrow$    | ↑            | const            |
| $\textbf{E.L}_{\textbf{DT}} > 0.6 \text{eV} \rightarrow \text{E}_{\text{C}}$ | $\downarrow$    | ↑            | const            |
| $\mathbf{E.L_{DT}} > 0.6 \mathrm{eV} \rightarrow \mathrm{E_{V}}$             | const           | const        | const            |
| $\mathbf{N}_{\mathbf{AT}}$ > 10 <sup>11</sup> cm <sup>-3</sup> $\uparrow$    | Ť               | $\downarrow$ | $\downarrow$     |
| $\mathbf{E.L_{AT}} > 0.8 \text{eV} \rightarrow \text{E}_{\text{C}}$          | $\downarrow$    | const        | const            |
| $\mathbf{E.L_{AT}} > 0.8 \text{eV} \rightarrow \text{E}_{\text{V}}$          | const           | $\downarrow$ | const            |

The distribution of potential in poly-Si, electric field in gate oxide, ionized DT density at Si/SiO<sub>2</sub> interface, and electron concentration in silicon, are shown on Fig. 3. The change of the threshold voltage  $V_{TH}$  of n-channel MOS, depending on interface DT parameters (donor trap density  $N_{DT}$  and associated energy level  $E.L_{DT}$ ) is given on Fig. 4.







Fig. 4. The change of the threshold voltage  $V_{TH}$  of n-channel MOS, depending on interface DT parameters.

The distribution of potential in poly-Si, electric field in gate oxide, ionized AT density at Si/SiO<sub>2</sub> interface and electron concentration in silicon, are shown on Fig. 5. The changes of the threshold voltage  $V_{TH}$  and leakage current  $I_L$  of n-channel MOS, depending on interface AT parameters (acceptor trap density  $N_{AT}$  and associated energy level  $E.L_{AT}$ ) are given on Fig. 6 and Fig. 7, respectively. Sets of AT parameters for the given ionized acceptor trap density and electron concentration on Fig. 5 are marked on Fig. 6.



Fig. 5. Interface AT influence on n-channel MOS transistor: a) ionized acceptor traps density, b) electron concentration.



Fig. 6. The change of the threshold voltage  $V_{TH}$  of n-channel MOS, depending on interface AT parameters.



Fig. 7. The change of the leakage current  $I_L$  of n-channel MOS, depending on interface AT parameters.

#### *B. n*-channel MOS transistor – bulk traps

The influences of bulk DT and AT on  $V_{TH}$ ,  $I_L$  and  $I_{SAT}$ , obtained by the analysing of the simulation results are summarized in Tab. II. It is obvious that the presence of DT and AT in semiconductor bulk also impact on the value of  $I_{SAT}$  due to recombination process.

TABLE II: The influence of bulk DT and AT on the electrical characteristics of n-channel MOS transistor.

|                                                                                                         | V <sub>TH</sub> | IL           | I <sub>SAT</sub> |
|---------------------------------------------------------------------------------------------------------|-----------------|--------------|------------------|
| $\mathbf{N}_{\mathbf{DT}}$ > 10 <sup>16</sup> cm <sup>-3</sup> $\uparrow$                               | $\downarrow$    | 1            | 1                |
| $\mathbf{E.L_{DT}} > \mathrm{E_{V}} \rightarrow \mathrm{E_{C}}$                                         | $\downarrow$    | ↑            | 1                |
| $\mathbf{N}_{\mathbf{AT}}$ > 10 <sup>16</sup> cm <sup>-3</sup> $\uparrow$                               | ↑               | $\downarrow$ | $\downarrow$     |
| $\mathbf{E}.\mathbf{L}_{\mathbf{AT}} > \mathbf{E}_{\mathbf{C}} \longrightarrow \mathbf{E}_{\mathbf{V}}$ | ↑               | $\downarrow$ | $\downarrow$     |

The distribution of potential in poly-Si, electric field in gate oxide, ionized DT density in semiconductor bulk, electron concentration and total current density, in silicon are given on Fig. 8, while the change of the threshold voltage  $V_{TH}$  of n-channel MOS, depending on bulk DT parameters (donor trap density  $N_{DT}$  and associated energy level  $E.L_{DT}$ ) is given on Fig. 9. Sets of DT parameters for the given ionized donor trap density, electron concentration and total current density on Fig. 8 are marked on Fig. 9.

The distribution of potential in poly-Si, electric field in gate oxide, ionized AT density in semiconductor bulk, electron concentration and carrier recombination velocity in silicon, are shown on Fig. 10, while the change of the threshold voltage  $V_{TH}$  of n-channel MOS, depending on bulk AT parameters (donor trap density  $N_{DT}$  and associated energy level  $E.L_{DT}$ ) is given on Fig. 11.



Fig. 8. Bulk DT influence on n-channel MOS transistor: a) ionized donor traps density, b) electron concentration and c) total current density.



Fig. 9. The change of the threshold voltage  $V_{TH}$  of n-channel MOS, depending on bulk DT parameters.



Fig. 10. Bulk AT influence on n-channel MOS transistor: a) ionized acceptor traps density, b) electron concentration and c) recombination velocity.



Fig. 11. The change of the threshold voltage  $V_{TH}$  of n-channel MOS, depending on bulk AT parameters.

#### C. n-channel VDMOS transistor – interface traps

The impact of DT and AT generated at the Si/SiO<sub>2</sub> interface generated under HEFS is analysed on typical nchannel VDMOS transistor. The channel length and gate oxide thickness of the VDMOS transistor are  $l_{CH}=1\mu m$  and  $d_{OX}=60nm$ , while the threshold voltage is  $V_{TH}=3.7V$ . The influences of interface DT and AT on  $V_{TH}$ ,  $I_L=I_D(V_{GS}=0.4V)$  and  $I_{SAT}=I_D(V_{GS}=10.V)$ , obtained by analysis of the simulation results are given in Tab. III.

TABLE III: The influence of interface DT and AT on the electrical characteristics of n-channel VDMOS transistor.

|                                                                           | V <sub>TH</sub> | IL           | I <sub>SAT</sub> |
|---------------------------------------------------------------------------|-----------------|--------------|------------------|
| $\mathbf{N}_{\mathbf{DT}}$ > 10 <sup>12</sup> cm <sup>-3</sup> $\uparrow$ | $\rightarrow$   | ↑            | const            |
| $\mathbf{E.L_{DT}} > 0.6 \mathrm{eV} \rightarrow \mathrm{E_{C}}$          | $\downarrow$    | ↑            | const            |
| $\mathbf{E.L_{DT}} > 0.6 \mathrm{eV} \rightarrow \mathrm{E_{V}}$          | const           | $\downarrow$ | const            |
| $\mathbf{N}_{\mathbf{AT}}$ > 10 <sup>11</sup> cm <sup>-3</sup> $\uparrow$ | Ť               | const        | $\downarrow$     |
| $\mathbf{E.L_{AT}} > 0.8 \text{eV} \rightarrow \text{E}_{\text{C}}$       | $\downarrow$    | const        | $\downarrow$     |
| $\mathbf{E.L_{AT}} > E_V \rightarrow 0.8 eV$                              | const           | const        | $\downarrow$     |

As can be seen, the interface DT has the identical impact on the electrical characteristics of VDMOS transistor as in the case of MOS transistor, while the impact of interface AT is slightly different. Unlike MOS transistor, here, the interface AT reduces the saturation current and does not affect on the leakage current.

The distribution of potential in poly-Si, electric field in gate oxide, ionized DT density at Si/SiO<sub>2</sub> interface, and electron concentration, are shown on Fig. 12, while the change of the threshold voltage  $V_{TH}$  of n-channel VDMOS, depending on interface DT parameters (donor trap density  $N_{DT}$  and associated energy level  $E.L_{DT}$ ) is given on Fig. 13. The values of threshold voltage  $V_{TH}$  and ionized interface DT concentration  $N_{DT}^+$  are given on Fig.12, while the sets of DT parameters for the given ionized donor trap density and electron concentration on Fig.12 are marked on Fig.13.



Fig. 12. Interface DT influence on n-channel VDMOS transistor: a) ionized donor traps density, b) electron concentration.



Fig. 13. The influence of interface DT parameters: donor trap density  $N_{DT}$  and energy level E.L<sub>DT</sub> on threshold voltage  $V_{TH}$ .

The distribution of potential in poly-Si, electric field in gate oxide, ionized AT density at Si/SiO<sub>2</sub> interface, and total current density, are shown on Fig. 14, while the change of the threshold voltage  $V_{TH}$  of n-channel VDMOS, depending on interface DT parameters (donor trap density  $N_{DT}$  and associated energy level  $E.L_{DT}$ ) is given on Fig. 15.



Fig. 14. Interface AT influence on n-channel VDMOS transistor: a) ionized acceptor traps density, b) total current density.



Fig. 15. The influence of interface AT parameters: acceptor trap density  $N_{AT}$  and energy level E.L<sub>AT</sub> on threshold voltage  $V_{TH}$ .

D. n-channel VDMOS transistor – bulk traps

The influences of bulk DT and AT on  $V_{TH}$ ,  $I_L$  and  $I_{SAT}$ , obtained by the analysing of the simulation results are summarized in Tab. IV.

TABLE IV: The influence of bulk DT and AT on the electrical characteristics of n-channel VDMOS transistor.

|                                                                                                   | V <sub>TH</sub> | IL           | I <sub>SAT</sub>                           |
|---------------------------------------------------------------------------------------------------|-----------------|--------------|--------------------------------------------|
| $\mathbf{N}_{\mathbf{DT}}$ > 10 <sup>15</sup> cm <sup>-3</sup> $\uparrow$                         | $\downarrow$    | 1            | 1                                          |
| $\mathbf{E.L_{DT}} > \mathbf{E_V} \rightarrow \mathbf{E_C}$                                       | $\downarrow$    | $\downarrow$ | 1                                          |
| $\mathbf{N}_{\mathbf{AT}}$ > 10 <sup>??</sup> cm <sup>-3</sup> $\uparrow$                         | no inf.         | 1            | $\downarrow$                               |
| $\mathbf{E} \cdot \mathbf{L}_{\mathbf{AT}} > \mathbf{E}_{\mathbf{C}} \rightarrow 0.5 \mathrm{eV}$ | no inf.         | const        | $\downarrow$                               |
| $\mathbf{E} \cdot \mathbf{L}_{\mathbf{AT}} > 0.5 \mathrm{eV} \rightarrow \mathrm{E}_{\mathrm{V}}$ | no inf.         | const        | $\downarrow\downarrow\downarrow\downarrow$ |

The bulk DT has the identical impact on the electrical characteristics of VDMOS transistor as in the case of MOS transistor. The bulk AT reduces the saturation current due to ionization of traps and recombination of electrons. This is particularly evident when the associated energy of AT is approaching to the top of the valence zone (Fig. 16).



Fig. 16. The influence of bulk AT associate energy level on the saturation current of n-channel VDMOS transistor.



Fig. 17. Bulk DT influence on n-channel VDMOS transistor: a) ionized donor traps density, b) recombination velocity.

The distribution of potential in poly-Si, electric field in gate oxide, ionized DT and AT density in semiconductor bulk, electron concentration, recombination velocity and total current density, in silicon are given on Figs. 17 and 18, while the change of the threshold voltage  $V_{TH}$  of n-channel VDMOS, depending on bulk DT parameters is given on Fig. 19, where the sets of DT parameters for the given distributions are marked.





#### IV. CONCLUSION

The impacts of defects, which are generated at the  $Si/SiO_2$  interface and in the semiconductor bulk under HEFS, on the electrical characteristics of n-channel MOS and n-channel VDMOS power transistors are analysed and simulated separately by using the program ATHENA for the complete technology process simulation and the device simulator ATLAS which are an integral part of Silvaco TCAD software package.

#### ACKNOWLEDGEMENT

This work has been supported by the Ministry of Education and Science of the Republic of Serbia, under the project TR 33035.



Fig. 19. The influence of interface DT parameters: donor trap density  $N_{DT}$  and energy level E.L<sub>DT</sub> on threshold voltage  $V_{TH}$ .

#### REFERENCES

- [1] Wang, T., Chiang, L., Zous, N., Chang, T., Huang, C., "Oxide Traps in MOSFET's by Using a Subthreshold Transient Current Technique", IEEE Transaction on Electron Devices, Vol. 45, No. 8, 1998, pp. 1791-1796.
- [2] Alwan, M., Beydoun, B., Ketata, K., Zoaeter, M., "Bias temperature instability from gate charge characteristics investigations in n-channel power MOSFET", Microelectronics Journal, Vol. 38, 2007, pp. 727-734.
- [3] Benlatreche, M.S., Rahmoune, F., Toumiat, O., "Experimental investigation of Si-SiO2 interface traps using equilibrium voltage step technique", Informacije MIDEM, Vol. 41, No. 3, 2011, pp. 168-170.
- [4] Esseni, D., Bude, J-D., Selmi, L., "On Interface and Oxide Degradation in VLSI MOSFETs—Part II: Fowler–Nordheim Stress Regime", IEEE Transaction on Electron Devices, Vol. 49, No. 2, 2002, pp. 254-263.
- [5] Cartier, E., "Characterization of the hot-electroninduced degradation in thin SiO<sub>2</sub> gate oxide", Microelectronics Reliability, Vol. 38, No. 2, 1997, pp. 201-211.
- [6] Aleksić, S., Pešić, B., Pantić, D., "Simulation of Semiconductor Bulk Trap Influence on the Electrical Characteristics of the n-channel Power VDMOS Transistor", Informacije MIDEM, Vol. 43, No. 2, 2013, pp. 124-130.
- [7] Aleksić, S., Bjelopavlić, D., Pantić, D., "Simulation of Bulk Traps Influence on the Electrical Characteristics of VDMOS Transistor", Proc. of XLVI International Scientific Conference on Information, Communication and Energy Systems and Technologies - ICEST 2011, pp. 271-274, Niš, Serbia, June 2011.
- [8] ATHENA User's Manual, SILVACO, Inc., CA, USA, 2013.
- [9] ATLAS User's Manual, SILVACO, Inc., CA, USA, 2013.

# Wireless Ad Hoc Network Simulation in Cloud Environment

### Leonid Djinevski, Sonja Filiposka, Igor Mishkovski and Dimitar Trajanov

*Abstract* - In this paper we present the utilization of cloud computing environment for fast network simulation of wireless ad hoc networks in 3D terrains. Considering 3D terrains involves larger amounts of data, thus the network simulation requires more compute intensive calculations. In this paper we evaluate the usage of cloud computing environment for network terrain aware simulation in order to optimally utilize the available hardware resources. Our experimental results show that there in not a significant decrease of performance when migration to a private cloud environment.

Keywords – Ad hoc, HPC, cloud computing, network simulation.

#### I. INTRODUCTION

Cloud Computing is becoming one of the most popular fileds in IT, finding applications in many diverse areas [1]. Cloud resource providers are offering many services to their customers, thus the capabilities of a given cloud solution is ambiguis to determine regarding performance. In this paper, we are focusing on the computational performance of the cloud environment.

Network simulators are tools used by researchers for testing new scenarios and protocols in a controlled and reproducible environment. That is, the user can to represent various topologies, simulate network traffic using different protocols, visualize the network and measure the performances. The drawback with these tools is their scaling, thus the execution time for a given simulation of medium to large networks, can take up few hours up to few days, or even weeks. Therefor waiting so much time makes the network simulators unsuitable for investigating protocols. In order to accelerate the simulation process, turning to high performance computing is the best choise, after optimizing the sequential implementation.

We have presented a 3D terrain aware extension [3] of the NS-2 network simulator [4] that enables simulation of wireless ad hoc networks, considering the terrain details. Additionally, we developed parallel implementation of the extension for GPU execution [5]. We have further optimized the performance of the extension [6] by

Leonid Djinevski is with Faculty of Information and

Communication Technologies, FON University, Av. Vojvodina

5, 1000 Skopje, Macedonia, E-mail:

{leonid.djinevski}@fon.edu.mk.

Sonja Filiposka, Igor Mishkovski and Dimitar Trajanov are with Faculty of Comuputer Science and Engineering, Ss Cyril and Methodius University, St. Rugjer Boshkovikj 16, 1000 Skopje, Macedonia, E-mail: {sonja.filiposka, igor.mishkovski, dimitar.trajanov}@finki.mk. introducing Triangular Irregular Network (TIN) terrain representation [7] [8]. Also, in [9] we have analysed the performance impact of the GPU memory configurations and proposed an approach for obtaining optimal performance. Parallel message-passing implementation for distributed execution was also developed in [10]. In this paper we are interested in the overall execution time of the network simulation running on a cloud instance of OpenStack [2]. Additionally, we present the results of migrating the network simulation execution of NS-2 on private cloud. Additionaly we compare the obtained results with on-premise execution of the same network simulator and under the same simulation scenarios.

The rest of the paper is organized as follows. In Section II we present a small introduction to network simulators and our extension of the NS-2 simulator for wireless ad hoc simulator. In Section III we present the migration of the simulator to the grid environment. The testing methodology is disrobed in Section IV, followed by the obtained results in Section V that show the impact of the extension of the NS-2 simulator on the time duration of the simulations. We conclude our findings in Section VI.

#### II. NETWORK SIMULATION

The NS-2 network simulator is considered to be a de facto standard simulator in the research community especially because of the existence of large number of implemented protocols. Although there is a NS-3 [11] version of the network simulator, which goal is to improve NS-2 by building the architecture from scratch, users still report problems with wireless ad hoc scenarios. The major issue that is reported is the very long simulation time, especially when using Ad hoc On-Demand Distance Vector (AODV) routing protocol. In a nutshell NS-3 is not yet mature enough to be as popular and verified as NS-2 which is still active. Regarding our terrain extension for the NS-2 network simulator which is not OTcl based, it can be easily ported to NS-3 network simulator.

The NS-2 network simulator contains two models (freespace and ground reflection) for wireless radio propagation when evaluating wireless ah hoc network performance. These propagation model are usually utilized by simulation modelers [12], however they are not terrain aware. By considering the terrain, the network simulation is made closer to real-life scenarios. In this paper, we are utilizing the Durkin's algorithm [3], which includes terrain details represented by TIN data.

#### A. Durkin's algorithm for wireless radio propagation

In this subsection we present the Durkin's algorithm, which makes use of diffraction and shadowing effects. The classical Fresnel solution is used for obtaining the diffraction loss, which is described by the following equations (1), (2) and (3):

$$G_d(dB) = 20 \log |F(v)|. \tag{1}$$

$$v = h \sqrt{\frac{2(d_1 + d_2)}{\lambda d_1 d_2}} .$$
 (2)

where F(v) is the Fresnel integral, which is a function of the Fresnel-Kirchoff diffraction parameter v defined in (2). The approximation of (1) is given by:

$$\begin{aligned} G_d(dB) &= 0, & v \leq -1 \\ G_d(dB) &= 20 \log(0.5 - 0.62v), & -1 \leq v \leq 0 \\ G_d(dB) &= 20 \log(0.5 * e^{-0.95v}), & 0 \leq v \leq 1 \\ G_d(dB) &= 20 \log(0.4 - 0.62v), & 1 \leq v \leq 2.4 \\ G_d(dB) &= 20 \log(\frac{0.225}{v}), & v > 2.4 \end{aligned}$$

Based on the conditions: if there is Line Of Sight (LOS), whether first Fresnel zone clearance is achieved or there is inadequate first Fresnel zone clearance, the durkin's algorithm using the diffraction parameter v can determine the path loss of for a given transmitter/receiver (TR) pair [13][14].

### III. NETWORK SIMULATION IN CLOUD INFRASTRUCTURE

Cloud computing is currently very popular and growing field, which offers many services, providing an optimal utilization of information technology resources. Thus, many educational and research institutions are adopting cloud computing in order to rationalize the why they manage resources [15]. The organization of the convencional and cloud computing system, that are running our NS-2 network simulation application is presented in figures 1 and 2.



Fig. 1. Conventional computing environment



Fig. 2. Cloud computing environment

The presented cloud computing environment, implements a private cloud solution which utilizes the available on-premise hardware resources. Although this cloud configuration has its advantages, regarding our computational requirements, it lacks scalability and elasticity [16], which is the case with public clouds. In this paper we are analyzing the migration of our 3D terrain aware extension for NS-2 network simulator on the private cloud environment.

#### IV. TESTING METHODOLOGY

The used technology is described in this section. Our experiments were conducted on: cloud computiong environment and on-premise environment. The hardware infrastructure is consisted of Intel(R) Xeon(R) CPU X5647 @ 2.93GHz with 4 cores and 8GB RAM. The operating system is running Scientific Linux distribution. The cloud environment uses the same hardware infrastructure and the same operating system as the first environment. OpenStack cloud solution software is deployed, and KVM hypervisor is used for instantiating virtual machines in the cloud.

#### A. Testing data and scenarios

Several sets of scenarios were defined as typical reallife scenarios for the evaluating of ad hoc wireless network performances. Two experiments are performed, consisted of series of test cases. For both of the experiments, we are running the same 3D terrain aware extension for radio propagation for the NS-2 network simulator, in orde to obtain fair results. The latest message-passing implementation [10] is used in order to utilize the available parallel resources. In the first experiment we are interested in the execution time of the NS-2 network simulator using the onpremese environment. We evaluate the influence on the execution performance by varing the terrain resolution (2,000, 4,000, 6,000, 8,000, 1,0000 and 12,000 triangles) of a given hill-like TIN-based terrain of Rhode Island, USA obtained from webGIS [17] and the different mobility scenarios of node velocity (1, 2 and 5m/s).

For the second experiment, we are ivestigating the execution time of the NS-2 network simulator using the cloud environment. The applied input parameters (different terrain details and node velocity) are the same as the first experiment.

Our expectation regarding the performance is that the network simulation in the cloud computing environment will run slower than the network simulation in the conventional computing environment.

#### V. RESULTS

This section presents the results that show the performance of wireless network simulation in conventional and cloud computing.

Figure 3 depicts the inverse execution time for different terrain details. With larger terrain details, smaller inverse time is achieved. It is also easy to notice that for bigger velocities, the inverse time is higher, which is expected.

Figure 4 presents the obtained results from the second experiment. The results show very similar values of the inverse time, for which there is no significant difference.



Fig. 3. Inverse execution time for conventional computing environment

In order to determine the trendline of the cloud computing environment compared to the conventional environment, we normalized the results with the inverse time of the conventional environment which is depicted in Figure 5.



Fig. 4. Inverse execution time for cloud computing environment



Fig. 5. Relative speedup of the cloud computing environment normalized over the

Our expectation that running the NS-2 network simulator on the cloud is slower than the conventional environment. The results for the cloud network simulation with node velocity of 2m/s, are discrepant for derrain resolution of 4000, 6000, and 8000 triangles. However, the difference from the other simulation scenarios of 1m/s and 5m/s are not significant.

#### VI. CONCLUSION

In this paper we analyzed the migration of NS-2 network simulation tool, using our terrain aware extension for wireless radio propagation, over a private OpenStack cloud computing solution. The performance of the network simulaton were observed, while focusing on the influence of different terrain details and node velocity over the overal duration of the simulation execution time.

The obtained results show that for both of the experiments, the duration of the simulation execution time is influenced by the node velocity. By increasing the terrain details, the execution time is impacted in a positive trend.

The comparison of the cloud computing experiment normalized by the convencional computing experiment proves our expectation that the network simulation runs slower in the cloud. Having investigated our the private cloud solution for NS-2 network simulation, as future work we are planning to migrate our terrain aware extension to a public multitenant cloud environment and analyze possible influences on the overall network simulation execution time.

#### REFERENCES

- [1] Armbrust, M., et al., "A view of cloud computing", Communications of the ACM, Vol.53, No. 4, 2010, pp. 50-58.
- [2] Openstack cloud software, (retrieved September 2013), http://openstack.org.
- [3] Filiposka, S., Trajanov, D., "Terrain-aware threedimensional radiopropagation model extension for ns-2", Simulation Vol. 87, No. (1-2), January 2011, pp. 7-23, DOI: 10.1177/0037549710374607.
- [4] NS-2, network simulator, (retrieved November 2010), http://www.isi.edu/nsnam/ns/.
- [5] Djinevski, L., Filiposka, S., Trajanov, D., Mishkovski, I., "Accelerating wireless network simulation in 3D terrain using GPUs", Tech. Rep. SoCD:16-11, University Ss Cyril and Methodius, Skopje, Macedonia, Faculty of Information Sciences and Computer Engineering (June 2012).
- [6] Vuckovik, M., Trajanov, D., Filiposka, S., "Durkins propagation model based on triangular irregular network terrain", In: ICT Innovations 2010, pp. 333-341. Springer (2011).
- [7] Unit 39-the tin model, (retrieved November 2011), http://www.geog.ubc.ca/courses/klink/gis.notes/ncgia/u 39.html.
- [8] Zeiler, M., "Modeling our world, environmental systems research institute". Inc. Redlands, California 1999.
- [9] Djinevski, L., Filiposka, S., Mishkovski, I., Trajanov, D., "GPU performance impact of the Durkin's radio

*propagation algorithm*", 21 Telecommunications Forum (TELFOR), IEEE Proceedings, 26 - 28 November, Belgrade, Srebia, 2013.

- [10] Djinevski, L., Filiposka, S., Trajanov, D., Mishkovski, I., "Message-passing terrain aware wireless network simulation", Tech. Rep. SoCD:11-13, University Ss Cyril and Methodius, Skopje, Macedonia, Faculty of Information Sciences and Computer Engineering, September 2013.
- [11] Stojanova, S., Djinevski, L, Mishkovski, I, Filiposka, S, Trajanov, D, "Micro-benchmarking NS-2 and NS-3 Network Simulators Using Terrain Aware Radio Propagation Extension", 3nd International Conference on Internet Society Technology and Management (ICIST 2013), Conference Proceedings, pp. 81-84, Kopaonik, Serbia, March 2013.
- [12] Rappaport, T. S., "Wireless Communications: Principles and Practice", Prentice Hall, New York, 2002.
- [13] Edwards, R., Durkin, J., "Computer Prediction of Service Area for VHF Mobile Radio Networks", Proceedings of the IEEE, Vol. 116, No. 9, pp. 1493-1500, 1969.
- [14] Rappaport, T. S., "Wireless Communications: Principles and Practice", Prentice Hall, New York, 2002.
- [15] Sultan, N., "Cloud computing for education: A new dawn?", International Journal of Information Management Vol. 30, No. 2, 2010, pp. 109-116.
- [16] Shtern, M., Simmons, B., Smit, M., and Litoiu, M., "An architecture for overlaying private clouds on public providers", in 8th Int. Conf. on Network and Service Management, CNSM 2012, Las Vegas, USA, 2012.
- [17] webGIS, Geopraphic Information System Resource, (retrieved September 2013), http://www.webgis.com/.

# Simulation of Dynamic Characteristic of *L*-branch Selection Combining Diversity Receiver in Nakagami*m* Environment

Dragan Drača, Aleksandra Panajotović, and Nikola Sekulović

*Abstract* – In this paper, *L*-branch selection combining (SC) diversity receiver, as powerful technique for mitigating an influence of multipath fading and cochannel interference (CCI), is considered in this paper. Average fade duration, as important dynamic characteristic, of this system in Nakagami-*m* fading environment is simulated using the sum-of-sinusoids-based Nakagami-*m* simulator. Simulation results show great agreement with earlier published numerical results.

*Keywords* – Fading, Selection combining diversity, Average fade duration, Sum-of-sinusoids-based simulator.

#### I. INTRODUCTION

Multipath fading due to multipath propagation and cochannel interference (CCI) as a result of frequency reuse which is essential in increasing cellular radio capacity are the main factors limiting system's performance [1]. Several statistical models are used to describe fading in wireless environments: Rayleigh, Nakagami-*m*, Rician, and Weibull. Nakagami-*m* distribution contains a set of other distributions as special cases and provides optimum fits to collected data in indoor and outdoor environments [2]. Moreover, it can model signal in sever, moderate, light and no fading environment via adjusting its parameter *m*. Having in mind all of that, there are a huge number of papers considering the performance of wireless systems over Nakagami-*m* fading channels.

Space diversity techniques, which combine input signals from multiple receive antennas, are the well known techniques that can be used to upgrade transmission reliability and increase channel capacity without increasing transmission power and bandwidth [3]. The most popular space diversity techniques are selection combining (SC), equal-gain combining (EGC), and maximal-ratio combining (MRC) [4]. In opposition to MRC and EGC, SC receiver is simpler for practical realization because it processes only one of the diversity branches. Traditionally, SC receiver selects the branch with the highest signal-tonoise ratio (SNR), or equivalently, with the strongest signal assuming equal noise power among the branches. However, in interference-limited environment, SC receiver can apply one of following decision algorithms: desired signal power algorithm, total signal power algorithm, and signal-to-interference ratio (SIR) algorithm. Desired signal power algorithm for an interference-limited SC system has identical performance as the total signal power algorithm over entire range of average SIR [5]. In addition, implementation of total power algorithm is the most practical among all decision algorithms, but desired signal algorithm is easier for mathematical modelling.

In this paper, motivated by the previous observations, dynamic characteristic of desired power signal based Lbranch SC diversity receiver operating over Nakagami-m fading environment in the presence of CCI is modeled and simulated using program package Matlab. The Nakagami*m* fading simulator incorporating Pop's architecture with Zhang decomposition algorithm is used [6]. In other words, a random phase into low-frequency oscillators for gaining the wide-sense stationary property is inserted, while decomposing a real number of the fading figure, m, into two parts, an integer and a fraction, is introduced to accomplish design [7]. The average fade duration (AFD) of considered system is simulated to reflect the correlation properties of fading channels and provide a dynamic representation of the system outage performance. Furthermore, simulation results are compared with previously published numerical results in papers [8], [9].

#### II. NUMERICAL RESULTS

The instantaneous SIR at the output of SC system applying desired signal power algorithm is given by  $\eta = \max\{r_1^2, r_2^2, ..., r_L^2\}/a^2 = r^2/a^2$ , where  $r_i$  is desired signal envelope on *i*-th diversity branch and *a* is CCI envelope at selected branch.

The AFD corresponds to average length of time in which envelope remains under given value, known as threshold. In interference-limited environment, the respective AFD at threshold  $\mu$ ,  $\mu = \sqrt{\eta}$ , is defined as [10]

$$T_{\mu}(\mu) = F_{\mu}(\mu) / N_{\mu}(\mu), \qquad (1)$$

where  $F_{\mu}(\mu)$  and  $N_{\mu}(\mu)$  denote the cumulative distribution function (CDF) and average level crossing rate (LCR) of the envelope ratio, respectively. The average LCR of the envelope ratio of desired signal and CCI,  $\mu$ , at threshold  $\mu_{th}$  is defined as the rate at which a fading process crosses level  $\mu_{th}$  in a positive (or negative) going direction and is mathematically defined by the Rice's formula [10]

$$N_{\mu}(\mu_{th}) = \int_{0}^{\infty} \dot{\mu} p_{\mu\dot{\mu}}(\mu_{th}, \dot{\mu}) d\dot{\mu}, \qquad (2)$$

where  $\dot{\mu}$  denotes the time derivative of  $\mu$  and  $p_{\mu\dot{\mu}}(\mu,\dot{\mu})$  is the joint probability density function (PDF) of random variables  $\mu(t)$  and  $\dot{\mu}(t)$  in an arbitrary moment *t*.

Expressions for the average LCR of dual and triple SC diversity system applying desired signal power decision algorithm over Nakagami-m fading channels in the presence of CCI are presented in [11], [9] as

$$N_{\mu}(\mu_{th}) = \sum_{k=0}^{\infty} \frac{\sqrt{2\pi} f_{m} m^{m+k-0.5} m_{I}^{m_{I}-0.5} \rho^{k} \mu_{th}^{2m+2k-1}}{k! \Gamma(m) \Gamma(m_{I}) (1-\rho)^{k}} \times \left[ \frac{2\beta \Gamma(m+m_{I}+k-0.5)}{(\delta+\gamma)^{m+m_{I}+k-0.5}} - \sum_{n=0}^{m+k-1} \frac{m^{n} \mu_{th}^{2n} \Gamma(m+m_{I}+n+k-0.5)}{n! (1-\rho)^{n}} + \frac{2\beta}{\Omega_{s}^{n} (2\delta+\gamma)^{m+m_{I}+k+n-0.5}} \right]$$
(3)

and

$$\begin{split} N_{\mu}(\mu_{h}) &= \frac{\sqrt{2\pi} f_{m} \mu_{h}^{2m-1} m_{l}^{m_{l}-0.5} m^{m-0.5} S^{m_{l}-0.5}}{\Gamma(m_{l})} \\ &\times \sqrt{Sm_{l} + m\mu_{h}^{2}} \sum_{i,j=0}^{\infty} \theta^{j} \alpha \Biggl[ \frac{2\Gamma(i+j+m)}{\Gamma(j+m)(1+\rho)^{i+j+m}} \\ &\times \Biggl( \frac{\Gamma(j+m+m_{l}-0.5)}{\alpha_{1}^{j+m+m_{l}-0.5}} \Biggr] \\ &- \sum_{k=0}^{i+m-1} \frac{\Gamma(j+m+m_{l}+k-0.5) \theta^{k}}{k! \alpha_{2}^{j+m+m_{l}+k-0.5}} \\ &- \sum_{l=0}^{j+i+m-1} \frac{\Gamma(j+m+m_{l}+k-0.5) \theta^{l}(1+\rho)^{l}}{l! \alpha_{3}^{j+m+m_{l}+l-0.5}} \Biggr] \\ &+ \sum_{k=0}^{i+m-1} \sum_{l=0}^{j+i+m-1} \frac{\Gamma(j+m+m_{l}+k+l-0.5) \theta^{k+l}(1+\rho)^{l}}{k! l! \alpha_{4}^{j+m+m_{l}+k+l-0.5}} \Biggr] \\ &+ \theta^{l} \Biggl( \frac{\Gamma(i+j+m+m_{l}-0.5)}{\alpha_{5}^{i+j+m+m_{l}-0.5}} \Biggr] \\ &- \sum_{l=0}^{j+m-1} \frac{\Gamma(i+j+m+m_{l}+k-0.5) \theta^{l}}{k! \alpha_{3}^{i+j+m+m_{l}+k-0.5}} \Biggr] \\ &+ \sum_{k=0}^{i+m-1} \frac{\Gamma(i+j+m+m_{l}+k-0.5) \theta^{k}}{k! \alpha_{3}^{i+j+m+m_{l}+k-0.5}} \Biggr]$$

respectively, where  $f_m$  is Doppler shift frequency,  $\rho$  is the

correlation coefficient, *m* and *m<sub>I</sub>* are Nakagami parameters describing fading severity of desired signal and CCI, respectively, average SIR is  $S = \Omega_s / \Omega_I$  and

$$\begin{split} \delta &= \frac{m\mu_{th}^{2}}{\Omega_{s}\left(1-\rho\right)}, \ \beta &= \frac{\sqrt{m_{I}\Omega_{s} + m\Omega_{I}\mu_{th}^{2}}}{\Omega_{s}^{m+k}\Omega_{I}^{m_{I}}}, \ \gamma &= \frac{m_{I}}{\Omega_{I}}, \ \chi &= m_{I}S, \\ \theta &= m\mu_{th}^{2}/(1-\rho), \ \alpha &= \rho^{i+j}/(i!j!\Gamma(m)), \ \alpha_{1} &= \chi + \theta, \\ \alpha_{2} &= \chi + 2\theta, \ \alpha_{3} &= \chi + (2+\rho)\theta, \ \alpha_{4} &= \chi + (3+\rho)\theta, \\ \alpha_{5} &= \chi + (1+\rho)\theta. \end{split}$$

The outage probability of the output SIR envelope,  $F_{\mu}(\mu_{th})$ , of the proposed dual and triple-branch SC diversity system can be obtained using [8], [9]

$$F_{\mu}(\mu_{ih}) = 1 - \sum_{k=0}^{\infty} \frac{\rho^{k}(1-\rho)^{m}}{k!\Gamma(m)}$$

$$\times \left[ 2\Gamma(m+k) - m^{m+k}\mu_{ih}^{-2m+2k} \sum_{p=0}^{m_{f}-1} \frac{m_{I}^{-p}(1-\rho)^{p}\Gamma(m+k+p)}{p!} + \frac{\Omega_{S}^{p}\Omega_{I}^{m+k}}{(m\Omega_{I}\mu_{ih}^{-2} + m_{I}\Omega_{S}(1-\rho))^{m+k+p}} + \frac{\Omega_{S}^{p}\Omega_{I}^{m+k}}{(m\Omega_{I}\mu_{ih}^{-2} + m_{I}\Omega_{S}(1-\rho))^{m+k+p}} \right] - \sum_{k=0}^{\infty} \sum_{l=0}^{m+k-1} \frac{\rho^{k}(1-\rho)^{m}}{k!l!\Gamma(m)} \left[ \frac{\Gamma(m+k+l)}{2^{m+k+l-1}} - m^{m+k+l}\mu_{ih}^{-2m+2k+2l} \sum_{p=0}^{m_{f}-1} \frac{m_{I}^{-p}(1-\rho)^{p}\Gamma(m+k+l+p)}{p!} + \frac{2\Omega_{S}^{p}\Omega_{I}^{m+k+l}}{(2m\Omega_{I}\mu_{ih}^{-2} + 2m_{I}(1-\rho))^{m+k+p+l}} \right]$$
(5)

and

$$F_{\mu}(\mu_{th}) = 1 - \left\{ \sum_{i,j=0}^{\infty} \alpha \left[ \frac{2\Gamma(i+j+m)}{\Gamma(j+m)(1+\rho)^{i+j+m}} \right] \\ \times \left( (1-\rho)^{m} \left( (j+m-1)! - \sum_{k=0}^{i+m-1} \frac{(j+m+k-1)!}{2^{j+m+k}k!} \right) \right] \\ - \sum_{l=0}^{i+j+m-1} \frac{(1+\rho)^{l}}{l!} \left( \frac{(j+m+l-1)!}{(2+\rho)^{j+m+l}} - \sum_{k=0}^{i+m-1} \frac{(k+l+j+m-1)!}{k!(3+\rho)^{k+l+j+m}} \right) \right] \\ - \sum_{p=0}^{m_{1}-1} \frac{\chi^{p} (m\mu^{2})^{j+m}}{p!(1-\rho)^{j}} \left( \frac{(j+m+p-1)!}{\alpha_{1}^{j+p+m}} - \sum_{k=0}^{i+m-1} \frac{(j+m+p+k-1)!\rho^{k}}{k!\alpha_{2}^{j+m+p+k}} \right) \right\}$$

$$-\sum_{l=0}^{i+j+m-1} \frac{(1+\rho)^{l} \theta^{l}}{l!} \times \left( \frac{(j+m+p+l-1)!}{\alpha_{3}^{j+m+p+l}} - \sum_{k=0}^{j+m-1} \frac{(p+k+l+j+m-1)!\theta^{k}}{k!\alpha_{4}^{p+k+l+j+m}} \right) \right) \right) + (1-\rho)^{m} \left( \frac{(i+j+m-1)!}{(1+\rho)^{i+j+m}} - \sum_{l=0}^{j+m-1} \frac{(i+j+m+l-1)!}{l!(2+\rho)^{i+j+m+l}} - \sum_{l=0}^{j+m-1} \frac{(i+j+m+k+l-1)!}{l!(3+\rho)^{i+j+m+l}} \right) \right) - \sum_{k=0}^{m_{r-1}} \frac{\chi^{p} \theta^{i+j} \left( m\mu_{th}^{2} \right)^{m}}{p!} \times \left( \frac{(i+j+m+p-1)!}{\alpha_{5}^{i+j+m+p}} - \sum_{l=0}^{j+m-1} \frac{(i+j+m+l+p-1)!\theta^{l}}{l!\alpha_{3}^{i+j+m+l+p}} - \sum_{l=0}^{j+m-1} \frac{(i+j+m+l+p-1)!\theta^{l}}{l!\alpha_{3}^{i+j+m+l+p}} - \sum_{l=0}^{j+m-1} \frac{(i+j+m+k+l+p-1)!\theta^{l}}{l!\alpha_{3}^{i+j+m+l+p}} - \frac{(i+j+m+k+l+p-1)!\theta^{l}}{l!\alpha_{4}^{i+j+m+k+p}} \right) \right) \right]$$

$$(6)$$



The architecture of sum-of-sinusoids-based Nakagami*m* simulator is depicted in Fig. 1 [7].



Fig. 1. The block diagram of sum-of-sinusoids-based Nakagami*m* simulator

The corresponding composite signal is

$$g(t) = \sqrt{\gamma \sum_{k=1}^{p} g_{I,k}^{2}(t) + \beta g_{Q}^{2}(t)}, \qquad (7)$$



Fig. 2. The algorithm for simulation of AFD of considered *L*-branch SC receiver

where

$$g_{I}(t) = 2\sqrt{\frac{2}{N}}$$

$$\times \left[\sum_{n=1}^{M} \cos \Phi_{n} \cos \left(\omega_{n}t + \Psi_{n}\right) + \sqrt{2} \cos \Phi_{n} \cos \left(\omega_{N}t + \Psi_{N}\right)\right],$$

$$g_{Q}(t) = 2\sqrt{\frac{2}{N}}$$
(8)

$$\times \left[\sum_{n=1}^{M} \sin \Phi_n \cos(\omega_n t + \Psi_n) + \sqrt{2} \sin \Phi_n \cos(\omega_N t + \Psi_N)\right],$$
<sup>(9)</sup>

$$\gamma = \frac{2pm \pm \sqrt{2pm(1+p-2m)}}{p(1+p)}$$
(10)

and

$$\beta = 2m - \gamma p \tag{11}$$

with p = [2m], N = 4M+2,  $\omega_n = 2\pi f_m \cos(2\pi n/N)$ ,  $\Phi_n = n\pi/M$ ,  $\Phi_N = 0$  and  $\psi_j$  is random phase uniformly distributed in the range  $(-\pi, \pi]$ .

Figure 2 describes AFD simulation process for desired signal based SC system operating in interference-limited Nakagami-*m* environment.

Figures 3 and 4 show simulation and numerical results, evaluated using program packages MatLab and Mathematica, respectively, for uncorrelated ( $\rho \rightarrow 0$ ) dual and triple SC diversity system [9] in environments under different fading severity.



Fig. 3. AFD of dual SC diversity system



Fig. 4. AFD of triple SC diversity system

The great agreement between numerical and simulation results is evident regardless of number of diversity branches or fading severity.

Aiming to achieve greater precision, number of choosen oscillators is M = 500. In all simulations maximum Doppler frequency is  $f_m = 100$  Hz causing selected  $\Delta t = 10 \ \mu s$ .

#### **IV. CONCLUSION**

This work presents the extension of [12] and it is results of intention to verify previously published theoretical results. AFD as important dynamic performance characteristic is simulated for SC diversity system with two and three uncorrelated branches in Nakagami-*m* fading environment in the presence of CCI. Simulation results obtained using program package Matlab show great agreement with earlier published numerical results calculated using program package Mathematica.

#### ACKNOWLEDGEMENT

This work has been funded by the Serbian Ministry of Education and Science under the projects TR-32052, III-44006 and TR-33035.

#### REFERENCES

[1] Parsons, J. D., "*The Mobile Radio Propagation Channels*", 2nd ed. New York: Wiley, 2000.

[2] Nakagami, M., "Statistical methods in radio wave propagation. The m-distribution – A general formula if intensity distribution of rapid fading", Oxford: Ed. Pergamon, 1960.

[3] Goldsmith, A., "Wireless communications", Cambridge University Pres: New York, 2005.

[4] Simon, M. K., Alouini, M.-S., "Digital Communication over Fading Channels", 1st ed. New York: Wiley, 2000.

[5] Yang, L., Alouini, M. -S., "Wireless communications systems and networks, Average outage duration of wireless communication systems (ch. 8)", US: Springer, 2004.

[6] Zhang, Q. T., "A decomposition technique for efficient generation of correlated Nakagami fading channels", IEEE Journal of Selected Areas in Communications, Vol. 18, No. 11, 2000, pp. 2385-2392.

[7] Wu, T-M., Tzeng, S-Y., "Sum-of-sinusoids-base Simulator for Nakagami-m fading channels", 58th IEEE Vehicular Technology Conference, 2003. VTC 2003-Fall, Vol. 1, 2003, pp. 158-162.

[8] Panajotović, A., Sekulović, N., Stefanović, M., Drača, D., Stefanović, Č., "Average Fade Duration of Dual Selection Diversity over Correlated Unbalanced Nakagami-m Fading Channels in the Presence of Cochannel Interference", Frequenz, vol. 67, no. 11-12, pp. 393-398, 2013.

[9] Panajotović, A., Stefanović, M., Drača, D., Sekulović, N., Stefanović, D., "Second Order Statistics of Triple Selection Diversity over Correlatd Nakagami-m Fading Channels in the Presence of Cochannel Interference", Telecommunication Systems, accepted for publication, June 2013.

[10] Dong, X., Beaulieu, N.C., "Average level crossing rate and average fade duration of selection diversity", IEEE Communication Letters, vol. 5, no. 10, 2001, pp. 396-398.

[11] Panajotović, A., Sekulović, N., Stefanović, M., Drača, D., "Average level crossing rate of dual selection diversity over correlated unbalanced Nakagami-m fading channels in the presence of cochannel interference", IEEE Communication Letters, vol. 16, no. 5, 2012.

[12] Stefanović, M., Drača, D., Sekulović, N., Panajotović, A., "Modeling and Simulation of L-branch Selection Combining Diversity Receiver in Nakagami-m Environment using Matlab", Conference Proceedings of SSSS 2012, pp. 115-118, Niš, Serbia, 2012.

# Single phase system for detection of harmonic pollution sources at power grid

### Dejan Stevanović, Predrag Petković and Volker Zerbe

*Abstract:* This paper present a system for harmonic source detection at power grid. It is implemented at Altera DE2 board. In combination with commercial power meter it represents powerful tool that allows utility to find each harmonic producer (nonlinear load). The base of this system is equation for distortion power calculation according to Budeanu definition. This equation showed up as the best indicator whether harmonic producer exists, or not at power grid. Measurement results that are obtained using this system at different type of light bulb confirmed our theory.

Key Words: Distortion power, FPGA, utility, detecting harmonic producer

#### I. INTRODUCTION

The last few decades are characterized by rapid development of electronics that changed the profile of the common customer's load. New electronic appliances characterize high sophistication, and high energy efficiency. Moreover, these devices reduce emission of carbon dioxide and bring smaller bill for consumed energy to customer. At first look everything looks great but that is not case. The main drawback of these devices is reach content of harmonics in load current, which cause many unwonted problems at utility and customer side [1], [2], [3]. These harmonics are result of the modern design of gadget. Namely, these devices operate at DC voltage while supplied from AC 230V RMS. Contemporary AC/DC converters are based on switching operation mode of transistors at frequencies up to several kHz. As result more power goes to the loads (electronic equipment) and less dissipate on AC/DC convertors. This has solved problem of energy efficiency but another problem aroused. All such loads, introduce harmonics at current. Increased number of AC/DC convertors connected to the power grid caused that the total distorted current reached very high level. Therefore, despite to the low resistance of power lines it jeopardizes the core of the power grid - integrity of the grid voltage. As result the utility faces the problem of increased loses [4], [5]. In order to save the system, many regulatory organizations brought standards that restrict the allowed amount of each harmonic. In order to save the

DejanStevanović is with Innovation Centre of Advanced Technologies CNT lmt (ICNT),VojvodeMišića 58/2,18000 Niš,Serbia, E-mail: dejan.stevanovic@icnt.rs.

PredragPetković is with the Department of Electronics, Faculty of Electronic Engineering, University of Niš, AleksandraMedvedeva 14, 18000 Niš, Serbia, E-mail: predrag.petkovic @elfak.ni.ac.rs.

Volker Zerbe is with Department Computer Engineering / Embedded Systems, AltonaerStrasse 25,99085 Erfurt, GermanyE-mail:volker.zerbe(at)fh-erfurt.de system, many regulatory organizations brought standards that restrict the allowed amount of each harmonic. Two widely known standards in this area are the IEEE 519-1992 and IEC 61000 series. However, in these standards, there is no method and/or index defined for the detection of dominant harmonic sources. Many authors tried to find the best solution. Consequently many different methods are developed so far. Table I presents the state of art.

 
 TABLE I

 The state of art focus on the harmonic source detection and sharing harmonic responsibility method

| Method (Indices)                                                           | <b>Required Data</b>                                           | Aims                                                                  |
|----------------------------------------------------------------------------|----------------------------------------------------------------|-----------------------------------------------------------------------|
| Active Power<br>Direction (APD)[6]<br>Reactive Power<br>Direction (RPD)[6] | Voltage and current<br>obtained by single<br>point measurement | Harmonic Source<br>Detection                                          |
| Nonactive Power<br>Method (NP)[7],[8]                                      | <u>r</u>                                                       |                                                                       |
| Harmonic Polluted<br>Ranking (HPR)<br>method[9]                            | Current obtained by<br>single point<br>measurement             |                                                                       |
| Critical Impedance<br>Method(CI)[10]                                       | Thevenin's<br>equivalents of<br>utility and<br>consumer sides  | Sharing harmonic<br>responsibility<br>between utility and<br>consumer |

Our opinion is that a simple, inexpensive and applicable solution exists. It relies on efficient method for detection and measurement of harmonic pollution at the grid user's connection point. This paper will explain and demonstrate the solution in the following five sections.

The subsequent section describes the basic principle of operation electronic power meter. The third section explains the hardware realization of system for harmonic detection source at power grid. The measured results are presented in fourth section before conclusion.

#### II. THE THEORY OF OPERATION OF ELECTRONIC POWER METERS

The core of each electronic power meters is a chip which calculates all power quantities that are of interest for utility to control consumption and create bills. Usually these values are defined by appropriate standard. All these circuits relay on digital signal processing of voltage and current samples. The instantaneous value of voltage and current are attenuated through voltage divider and current transformers respectively. The obtained signal at output of attenuator is sent to ADC where is sampled at discrete
time points (at least two per a period, according to the Nyquist-Shannon theorem) and digitalized. DSP processes digital voltage and current samples and calculate all necessary power quantities. Instantaneous value of signal (current or voltage) in time domain can be express as:

$$x(t) = \sqrt{2} X_{RMS} \cdot \cos(2\pi f t + \varphi) \quad . \tag{1}$$

After the discretization in equidistant time intervals it transforms to:

$$x(nT) = \sqrt{2}X_{RMS} \cdot \cos(2\pi \frac{f}{f_{sempl}}n + \varphi) \quad , \qquad (2)$$

where f and  $f_{sempl}$ , are frequency of the signal and the sampling frequency. The RMS value is calculated using the following equation:

$$X_{RMS} = \sqrt{\frac{\sum_{n=1}^{N} x(nT)^{2}}{N}} .$$
 (3)

The active power is obtained as average of the instantaneous multiplication of instantaneous values for current and voltage, and average active power one gets in form:

$$P = \frac{\sum_{n=1}^{N} v(nT)i(nT)}{N} = \frac{\sum_{n=1}^{N} p(nT)}{N}.$$
 (4)

The same equation is used for reactive power calculations, only difference is in voltage samples that are shifted for  $\pi/2$ .

In addition, apparent power *S* can be calculated s the product of RMS voltage and current values.

$$S = V_{RMS} \cdot I_{RMS} \tag{5}$$

Some level of error in active and reactive power calculation is possible. This error is caused due to phase difference between voltage and current and the fact that the power line frequency is slightly changed around the nominal (50Hz). These errors can be eliminated/diminished by additional calibration and correction within appropriate filters.

Once when P is calculated according to Eq. (4), Q calculated on similar way using shifted voltage samples, and apparent power S obtained using (5) one easily can compute distortion power using Budeanu's definition:

$$D = \sqrt{S^2 - P^2 - Q^2} \,. \tag{6}$$

Unfortunately the existing regulation did not require that power meter calculate component of apparent power. Therefore direct implementation of Eq. (6) for distortion power is not possible for most commertialy available meters.

Despite some arguing about the accuracy of Eq. (6), up today it only has real practical application. All other are too complicated to be implemented at comercial level. In practice many authors confirmed that the value of distortion power defined with Eq. (6) directly follows total harmonic distortion of current [11]. As stated at beginning, harmonic cause many unwonted problems at customer and utility side. If one wants to reduce the level of harmonic he needs an instrument to measure them. It would be very convenient if the instrument could use measured by commercial electronic power meter. The following section will describe our solution.

### III. SYSTEM FOR DETECTION SOURCE OF HARMONIC POLLUTION AT POWER GRID

So far there several methods for harmonic source detection at power grid are published [6], [7], [8], [9], [10]. Neither of them can give precise information about the pollution produced by a single customer. That is one of the main drawbacks of already existed methods. Moreover, these methods cannot be easily implemented at ordinary power meter. Therefore we were inspired to offer our solution that is based on Eq. (6). As we show in some of previously published papers [12], the value of distortion power is good indicator of existing harmonic source at grid. Bearing in mind the expenses of chip redesign and/or modification of power meters in service, we suggest an upgrade that could be easily fitted into existing power meters. Hence our idea is to realize hardware that can be implemented as a dongle for all electronic meters without change in their construction. Namely on this way we will just connected our system at the power meter. This system is implemented at Altera DE2 board with Altera Cyclone® II 2C35 FPGA. The block diagram of realized system is shown in Fig. 1. It consists of RS232 interface, RAM, ROM, two address generator blocks for both type of memory, FSM, and circuit for power distortion calculation. The data manage transfer is controlled by FSM. ROM memory is used to memorize commands that are sent to power meter, while RAM is used to memorize received data.



Fig.1. Block diagram of realized system

The communication between Altera DE2 board, where system is realized, and power meter is done through RS232 and optical port. Communication is done in two steps. In the first step the system sends command that require data for voltage and current from power meter and saves them in RAM memory. Then system requires and stores data for active and reactive power. All received data come in BCD format. Therefore it is necessary to convert them in HEX format using Shift and Sub-3 Algorithm and extended to words 24 bit long. The theory of the conversion algorithm is simple: divide 24 bit numbers in 4-bits numbers that presents the hundreds thousands, tens thousands, thousands, hundreds, tens, and units. After that check if they are greater or equal to 8, then subtract 3 from it. After that, shift the binary number right by one bit. Finally we repeat the process 24 times.

#### A. RS232 Interface

The RS232 is the widely used asynchronous serial wire interface brought by Electronic Industries Association (EIA) for the interchange of data between two devices. It was initially developed by the EIA to standardize the connection of computers with telephone line modems and letter became inevitable part of electronic equipment. Moreover it becomes standard communication protocol integrated inside processors and microcontrollers. This interface works in combination with UART universal asynchronous receiver/transmitter. When transmitting a byte, the UART (serial port) first sends a START BIT which is a positive voltage (0), followed by the data (generally 8 bits, but could be 5, 6, 7, or 8 bits) followed by one or two STOP BITs which is a negative(1) voltage. The RS-232 standard specifies that logic "1" is to be sent as a voltage in the range -15 to -5 V and that logic "0" is to sent as a voltage in the range +5 to +15 V. This standard defines that voltage with amplitude of at least 3 V will always be recognized correctly at the receiver according to their polarity. Therefore it tolerates appreciable attenuation along transmission line. The waveform of transmitted signal at UART Tx pin is shown in Fig. 2.



Fig.2 shows a waveform of transmitted byte

The baud rate of the sent word is device-dependent. It is usually in range from 300 to 230400 bit/s. The structure of realized RS232 interface is very simply. It is based on two shift register: The first shift register accepts the input data at TxData(7:0) port and automatically serialize and emit the byte on the Tx pin. During emitting data at Txpin, pin IntTx is reset to indicate that transmitting is not complete. Therefore a rising edge on IntTx\_O can trigger the interrupt line of a microcontroller to emit another byte. The second shift register un-serialize data received on Rx pin. When the received bit stream is un-serialized the IntR pin is set. This announces that the received byte can be read on the data output bus RxData. As soon as the byte is read, IntRx is reset.

#### B. Distortion power calculation

The block diagram of circuit for power distortion calculation is shown in Fig. 3. Note that this part of VHDL code can be used as a predesigned IP core ready to be embedded into integrated power meter IC.



Fig.3. Block diagram of circuit for power distortion calculation

The serial multiplier successively accepts 24-bit wide values of active, reactive and apparent power from 24 bits registers. This multiplier is realized using iterative method. The proposed algorithm is very fast and its hardware implementation has small chip area. For the whole process of multiplication it needs the number of clock pulses that equals length of operands in bits. After multiplication AddSub block adds squared values of  $P^2$  and  $Q^2$  and subtract it from  $S^2$ . Eventually the square root block calculates D.

The circuit for square root computation is realized using iterative method or Longhand square root computation method. This algorithm is very fast and its hardware implementation requires small chip area. The algorithm computes square root on the same way like people do manually. The detail information about realization of this circuit can be found in [13].

#### IV. RESULTS OF MEASURING

The hardware of the realized system is verified using a set of different energy saving light bulbs. They are chosen as benchmarks for simple nonlinear loads that characterize small nominal power. Namely the intention is to show that small numbers that may appear after subtracting in Eq. (6) did not play important role for nonlinear load detection. This is due to the good resolution of data provided by the standard electronic power meter.

Fig.4 illustrates how the implemented system operates in conjunction with ordinary electronic power meter. This meter is produced by EWG electronics [14]. It fulfills the standard IEC 62052-11 [15].Note that this meter already provides  $I_{\text{RMS}}$ ,  $V_{\text{RMS}}$ , P and  $Q_{\text{B}}$ , according to Eq. (3) and Eq. (4), respectively. The data from power meter are read using its optical head (optical port) and transmitted to Altera DE2 board trough the RS232 port. Then after, calculating  $D_{\text{B}}$  as stated by Eq. (6) is straightforward. The developed hardware is compatible with a wide range of other power meters that meet similar specifications regarding standards, type of output data and optical port.

Table II summarizes obtained results of measuring collected from the meter ( $V_{\text{RMS}}$ ,  $I_{\text{RMS}}$ , P,  $Q_{\text{B}}$ ) and provided by the proposed dongle (S and  $D_{\text{B}}$ ). The value of distortion power that is calculated at FPGA on Altera DE2 board appears on seven segment display as illustrated in Fig.4.



Fig4. Realized system for distortion power measurement

|                              |               | TCD.             | OLI OI MLA | JOREMENT |             |                  |                   |                   |
|------------------------------|---------------|------------------|------------|----------|-------------|------------------|-------------------|-------------------|
| Loads                        | $U_{\rm RMS}$ | I <sub>RMS</sub> | S          | Р        | $Q_{\rm B}$ | $D_{\mathrm{B}}$ | $D_{\rm B}/S[\%]$ | $D_{\rm B}/P[\%]$ |
| Incandescent lamp 100W       | 218.96        | 0.42             | 91.96      | 91.96    | 0.74        | 0.00             | 0.00              | 0.00              |
| FL18W                        | 218.62        | 0.08             | 17.49      | 11.33    | -5.80       | 11.99            | 68.58             | 105.83            |
| CFL20Wbulb                   | 218.55        | 0.13             | 29.07      | 18.30    | -8.81       | 20.79            | 71.54             | 113.61            |
| CFL 20Whelix                 | 219.01        | 0.14             | 30.66      | 18.61    | -9.38       | 22.49            | 73.35             | 120.85            |
| CFL 20Wtube                  | 219.46        | 0.14             | 31.60      | 18.73    | -9.58       | 23.58            | 74.62             | 125.89            |
| CFL 15Wbulb                  | 219.74        | 0.09             | 19.56      | 12.10    | -5.51       | 14.34            | 73.34             | 118.51            |
| CFL 11Whelix                 | 221.73        | 0.08             | 17.74      | 10.42    | -5.38       | 13.31            | 75.03             | 127.74            |
| CFL 11Wtube                  | 221.27        | 0.08             | 17.92      | 10.76    | 5.74        | 13.13            | 73.28             | 122.03            |
| CFL 11WE14                   | 215.51        | 0.08             | 17.24      | 10.79    | -5.26       | 12.38            | 71.78             | 114.74            |
| CFL 9Wbulb                   | 216.06        | 0.06             | 12.75      | 7.58     | -3.64       | 9.58             | 75.16             | 126.39            |
| CFL 7Wspot                   | 217.75        | 0.04             | 9.58       | 5.83     | -2.87       | 7.04             | 73.48             | 120.75            |
| CFL 7W                       | 219.83        | 0.04             | 9.67       | 6.03     | -2.57       | 7.11             | 73.54             | 117.91            |
| CFL 15Whelix                 | 218.55        | 0.15             | 32.13      | 18.95    | -10.26      | 23.83            | 74.17             | 125.75            |
| CFL 20Wtube                  | 216.91        | 0.11             | 24.08      | 13.86    | -7.15       | 18.34            | 76.19             | 132.32            |
| LED Parlamp 15W(9x1.5W)      | 217.27        | 0.157            | 34.11      | 16.9     | -3.87       | 29.38            | 86.12             | 173.85            |
| LED Parlamp(6x1.5W)          | 217.51        | 0.114            | 24.80      | 12.89    | -2.74       | 21.00            | 84.71             | 162.92            |
| LED Bulb(7x1W)<br>Warm White | 218.02        | 0.083            | 18.10      | 9.7      | -2.84       | 15.01            | 82.95             | 154.74            |
| LED Bulb(6x1W)<br>Warm White | 217.93        | 0.042            | 9.15       | 7.76     | -0.14       | 4.85             | 53.00             | 62.50             |
| LED Bulb(6x1W)White          | 217.85        | 0.045            | 9.80       | 8.34     | -0.16       | 5.15             | 52.53             | 61.75             |
| LED Bulb 3x1W                | 217.9         | 0.034            | 7.4086     | 3.96     | -0.89       | 6.20             | 83.66             | 156.57            |
| LED MiniBulb3x1W             | 215.86        | 0.034            | 7.33924    | 3.91     | -1          | 6.13             | 83.52             | 156.78            |

TABLE II Result of measurement

Once when the utility is able to register the level of distortion power at Point of Common Coupling (PCC) it could provide better control at the grid. The controlling mechanism may be explored through the billing policy or by disconnection of large nonlinear loads. The first will reduce losses caused by the lack of ability to measure considerable part of the supplied energy. The second could be activated to protect other consumers from irresponsible users.

#### V. CONCLUSION

This paper presented a single-phase system that can be used for detection each source of harmonic pollution at power grid. It can be implemented in on-shelf power meter. The advantaged of this system is the fact that can be used as a dongle without any changes at power meter. Practically in many countries old electrical power meters have recently been replaced by electronic meters. However all of them are not able to register distortion power that in contemporary households and offices arise to values that cannot be ignored. The result from Table II indicated that utility suffers large losses due to the lack of registering distortion power. In cases of most energy saving light bulbs it is greater than registered active power. As we have recently published in [4], [5] measuring distortion power at PCC helps the utility to eliminate losses. Therefore there is a need to attach an additional inexpensive hardware that will operate in conjunction with existing meters and upgrade their possibilities. The realized system allows utility to detect and quantify the level of pollution from each customers, what makes the proposed system unique.

#### ACKNOWLEDGEMENTS

Results described in this paper are obtained within the project TR32004 founded by Serbian Ministry of Science and Technology Development.

#### 7. References

 Singh G.K.: "Power system harmonics research a survey", European Transactions on Electrical Power, Vol.19, 2007, pp. 151–172

- "Harmonic Distortion in the Electric Supply System" Integral Energy Power Quality Centre: Technical note No. 3, March 2000
- [3] Stevanović, D., Petković, P.: "Harmonics in Power System Problems and Solutions (in Serbian\*)", Proceedings of XII International Scientific SymposiumINFOTEHA®-JAHORINA 2013, Jahorina, Bosnia and Herzegovina, 20. mart - 22. mart, 2013, pp. 203-208, ISBN 978-99955-763-1-8
- [4] Stevanović D., Jovanović B., Petković P.:"Simulation of Utility Losses Caused by Nonlinear Loads at Power Grid", Proceedings of Small System Simulation Symposium 2012, Niš, Serbia, 12.02.-14.02., 2012, pp. 155-160.
- [5] Stevanović, D., Petković, P.: "The Losses at Power Grid Caused by Small Nonlinear Loads", Serbian Journal of Electrical Engineering, Volume 10, No. 1, Febryary 2013, Cačak, Serbia, 2013, pp. 209-217, ISSN 1451-4869
- [6] Xu, W., Liu, X., Liu, Y. (2003). "An investigation on the validity of power-direction method for harmonic source determination". IEEE Trans. Power Del., Vol. 18, No. 1, 214 - 219.
- [7] Cataliotti A., Cosentino V. (2009). "Disturbing loads identification in power systems: a single-point timedomain method based on the IEEE 1459-2000". IEEE Trans. Instrum. Meas., Vol. 58, No.5, 1436–1445.
- [8] Cataliotti, A., Cosentino, V. (2009). "A single-point approach based on IEEE 1459-2000 for the identification of prevailing harmonic sources detection in distorted three phase power systems". *Metrol. Meas. Syst.*, Vol. 16, No.2, 209–218.
- [9] Lee, S., Park, J. W.(2010). "New power quality index in a distribution power system by using RMP model". IEEE Tran. on Industrial Applications, Vol. 46, No.3, 1204 - 1211.
- [10] Chun, L., Xu, W., Tayjasanant, T. (2004). "A critical impedance based method for identifying harmonic sources". IEEE Trans. Power Del., Vol. 19, No.2, 671–678.
- [11] Dimitrijević, M.: "Elektronski sistem za analizu polifaznih opterećenja baziran na FPGA", Doktorska disertacija, Niš, 07.12., 2012.
- [12] Stevanović, D., Petković, P.: "The Efficient Technique for Harmonic Sources Detection at Power Grid", Przegląd Elektrotechniczny, 2012., pp. 196-199, ISSN 0033-2097
- [13] Jovanović, B., Damnjanović, M., Litovski, V.: "Square Root on Chip", ETF Journal of Electrical Engineering, A Publication of the EE Department, University of Montenegro, Vol. 12, May, 2004, pp. 65-75, YU ISSN 0353-5207
- [14] EWG multi metering solutions, www.ewg.rs
- [15] Electricity metering equipment (AC) General requirements, tests and test conditions Part 11: Metering equipment, IEC 62052-11, February. 2003.

# Application of a bidirectional electricity meter in the 5kW grid-connected photovoltaic power plant

#### Zoran Petrušić and Andrija Petrušić

*Abstract* - In this paper the application of a bidirectional electricity meter in the 5 kW grid-connected photovoltaic (PV) plant is presented. The PV plant is operating since August 2013, as the research laboratory at the Faculty of Electronic Engineering in Niš. The realised PV plant enables performance evaluation of different PV system components under environmental conditions. The realised PV system for evaluation of characteristics of a PV system is consisted of a rotating and fixed part. The solar tracker can carry up to 10 PV modules with different technologies, power and dimensions, while the fixed construction can support up to 12 PV modules. The PV plant is connected to the grid within the industrial hall through the concept of net metering. The complex contemporary system for measurement of generated and consumed electric energy is also presented in the paper. This system is realized using the bidirectional electricity meter.

*Keywords* – bidirectional meter, on-grid PV plant, net metering, monitoring system.

#### I. INTRODUCTION

The Renewable energy sources (RES) represent a healthy alternative to conventional energy sources, taking into consideration all the negative impacts that fossil fuels have on environment. For most Countries achieving significant levels of RES exploration will lead to certain degree of import independency of primary sources, thus making the RES part of Country's strategic development.

#### A. The strategic importance of PV

The photovoltaic (PV) solar energy is one of the most distinctive RES, which main characteristic is the unlimited availability, possibility to have the installations in close proximity of consumer and without the need for significant investments in existing infrastructures. Over the years, the technological maturity and increase in energy efficiency of PV modules made this kind of investments cost-effective even on the level of household consumers. In combination with adequate policy instruments the significant increase in installation of small-scale PV plants in Europe was notices, with Germany as the best example even with moderate solar energy potential [1].

#### B. Overview of successful projects

Good example of German success is the project [2] conducted in 1994 and funded by the German Federal Ministry for Research and Technology and Governments of involved German Federal States, under which 2000 of ongrid PV plants was installed with variable power output form 1 kW to 5 kW, with total installation of 5 MW. Being that in urban areas the lack of available space is always the issue, the biggest potential for PV plant installations are to rooftops. Therefore, all the PV systems within the project were installed or integrated on south side of households' rooftops.

The key aspect of the project was the development of special program for long-term tracking and analyzing of installed PV plants (L-MAP) that involved all national eminent institutes.

Under the global monitoring program, each PV plant contained three meters for measurement of total generated electric energy, the excess produced energy that was fed into the grid and the locally consumed energy. The detailed measurements carried out on the field were initiated for performing a comparative analysis between efficient and less efficient PV installations, as well as to gain experience for larger integration and further optimization of on-grid PV systems.

Larger presence of PV installations in overall energy production led to increase in number of PV laboratories around the globe.

For instance, in South Korea the Field Demonstration Test Center with four 3 kW on-grid PV systems was developed [3]. The designed system for monitoring enables measurement and analysis of PV system performance in relation to the meteorological conditions. Furthermore, the conditions were made for long-term assessment of installed components, as well as of entire PV systems, through various indicators, such as efficiency, capacity and generated electric energy.

Another example of PV laboratory is the 10 kW PV system, part of the energy farm within the School of Renewable Energy Technology of Naresuan University in Thailand [4] [5] [6], which was designed to enable detail efficiency and performance assessment of PV systems and individual components. Beside the comparison of different module technologies and outputs, short-term and long-term

Zoran Petrušić is with the Department of Microelectronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Nis, Serbia, E-mail: zoran.petrusic@gmail.com

Andrija Petrušić is with Innovation Center od Advance Technologies of Nis, Vojvode Misica 58/2, 18000 Nis, Serbia, Email: andrija.petrusic@icnt.rs

dynamics of energy efficiency and other parameters can be observed as well. In addition, system was designed modularly allowing full development to the level of microgrid system, which included integration of other alternative technologies for primary source, e.g. biomass and fuel cells.

#### C. The PV laboratory

The PV laboratory within the Faculty of Electronic Engineering, University of Nis represents the combined PV system of 5kW installed power integrated within the facilities of small-scale industrial consumer with volatile load curve. It is designed for conducting multifunctional and multidisciplinary assessments and analysis (stationary vs. rotating systems, measurement of performance and UI characteristics of PV modules under outdoor, working conditions, the impact of wind on system supporting construction, the impact of temperature and other external condition of performance of PV plant, etc.). The implemented PV laboratory is an open system able to integrate other RES such are wind turbines, fuel cells, biomass generators, accumulating banks for energy storage, which will allow further comparative analysis.

With connection of the PV system to the distribution grid, the conditions have been met for the development of modern system for the monitoring of various parameters for performance measurement. The current system for supervision of PV laboratory enables data acquisition from inverters, installed meteorological unit and installed bidirectional meter. The monitoring information system is based on not standardized data acquisition and storage methods, which represent combined hardware and software solutions, which are distributed as a technical support by manufactures of aforementioned components.

The data that can be collected from inverters produced by DIEHL (Germany), under the Platinum production series, derives from parameters such are: DC input power, AC output power, the inverter status, operating time, network signal, inverter alarms, etc.

For collecting meteorological data, a professional meteorological unit WXT510 produced by Vaisala (Finland) is used. This device allows measurements of six meteorological parameters: direction and power of wind, intensity of rainfall, atmospheric pressure, and relative humidity.

Bidirectional meter constructed by ATLAS-AMR (Serbia) is used for collecting overall system performance data through various parameters: active and reactive electric power for different tariffs, in both direction and for all four quadrants [7], furthermore maximum min power, real time value of current, voltage, power, frequency and power factor in three phase system with four or three conductors.

In the construction phase the focus was on bidirectional meter, because of the potential that PV systems with net metering will have in the future. In cooperation with domestic producers (ATLAS-AMR) of programmable electronic meters a modification of an existing meter was performed for measuring performances in all four quadrants (in both directions of current flow). Suitable modifications were made to supporting software application for collecting all measured data for further data storage in centralized database.

In this paper are presented two independent energy source installations in on-grid PV system. The configuration for generation, transmission and consumption of electric energy is given as well. Furthermore, the basic functionalities of software framework are presented. The use of bidirectional meter for net metering is a novelty in this region and, therefore a special attention will be given to this concept of metering.

#### II. SYSTEM DESIGN AND COMPONENTS

PV power plant converts sunlight into electric energy. In the realized grid –connected PV system (figure 1) there are few sources of electric energy which include:

1) 3kW PV power plants with fixed mechanical construction placed on the ground with PV modules in single crystal and polycrystalline technology. Fixed system on which PV modules are installed is oriented to the south with the possibility of changing the inclination from 30° to 45°, in order to experiment with the optimal tilt angle for our geographical latitude. Change of the inclination makes it possible to determine the dependence of the reduction of annually generated electric energy in the function of deviation from the optimal orientation.

2) 2kW PV power plant with installed two-axis rotating solar tracker also with PV modules in single crystal and polycrystalline technology. The realized rotating system belongs to the group of small positioners and enables experimental determination and confirmation of theoretical advantages of electric energy generation, which are associated with single-axis and two-axis systems in large PV power plants.



Fig. 1. Sources of electric energy in the realized grid connected PV system

Grid-connected PV systems are consisted of a PV string, one or more grid-connected solar inverters, safety device for automatic shutdown when the grid is

disconnected and electricity meter. Grid inverter converts direct current (DC) from the PV modules into alternating current (AC), which is simultaneously synchronized with the grid.

Currently there is no global stance on size and number of grid-connected PV systems and economic factors, which limit the optimal size of a grid-connected PV system, depend mainly on various financial incentives and legal regulations of individual countries.

Today, there are two dominant concepts in the world. The concept of "Net metering", which is widely used in the U.S. and Canada and rely on grid-connected PV systems which are dimensioned in the way to generate electricity at the level of consumer spending, so that the power distribution network appears only as a local energy storage. There are no incentives and legal regulations to build a sustainable system that would generate more electricity, instead there are limitations and levels in terms of the maximal power [8]. Since local energy storage facilities are not needed, limiting factors for determining the size of grid-connected PV system in this concept are the available space (often the roof), investment costs and regulatory frameworks including subsidies and promotional programs. Another concept, known as a "feed-in" which is present in Germany and other EU countries, relies on financial incentives for the construction of large PV systems and allows the excess of net profit. Despite all the economic benefits, generated photovoltaic energy has not yet reached parity with the grid energy network, that is the point at which the costs of producing and buying from the grid are the same.

In Serbia, there are only PV systems with feed-in tariff, but the quota is limited and dedicated every three years. The quota is limited up to 5 MW for the ground and up to 4 MW for roofs. Net metering in Serbia isn't legally regulated, but it is likely in the near future and it should be adopted, because almost all countries have adopted it. Net metering is used in cases where the prices of purchased and delivered energy are the same. Since the realized PV plant does not belong to the group of privileged energy producers a bidirectional smart metering group is installed with the aim of implementing various research because it will be inevitable component of future smart grid.

The structure of the realized grid-connected PV plant with the basic components for generation and transmission of energy is shown in figure 2.



Fig. 2. Block diagram of the realized 5kW grid-connected PV plant

#### **III. MONITORING SYSTEM**

This part of paper presents a system for monitoring of generated and consumed energy with the realized 5kW, modular PV system (figure 3). Concept of this modern monitoring system is such that it fits perfectly with the modern so-called smart grids. Smart grids can efficiently (cost-effective) integrate behavior and actions of all connected users (generators, consumers and those that perform both activities), in order to provide sustainable power system with low losses and high levels of safety, quality and security of supply. The main objective of this system is real time acquisition of the operating parameters of PV power plant.

Some of the benefits obtained by realization of such a system are:

- Monitoring of profitability with advanced tools for data analysis,

#### Automatic access to the reports,

- Easy way to generate appropriate graphs, tables

and reports,

- Web based monitoring and control with application access from multiple levels.



Fig. 3. Realized modern system for monitoring of the 5kW grid-connected modular PV system

During the grid connection of the PV system, it was decided to install the modern programmable electricity meter as a control meter. This enables that all of the requirements set by the advanced metering infrastructure are fulfilled, and on the other hand this meter is actively involved in new systems for monitoring and management of energy consumption.

Installed meter is modified four quadrant three-phase electricity meter for four-wire connection with three measuring systems. Tariff management is performed with the meter's real time clock with calendar implemented in the meter. The meter has different types of communication modules, in order to fully support the active management. The available communication modules are: GPRS/GSM, PLC, RS232/RS485, MBUS, wireless MBUS, ZigBee, RF. The realized system is based on RS232 communication due to the proximity of PC server.

Installed electronic programmable meter monitors 13 parameters of the distribution network, active energy, reactive L and C energy, as well as the maximal output power. Screenshot of the realized communication software for data transfer into the database is shown in figure 4.

Also, the meter automatically generates data of mean power for the previous hour on every 15 minutes, which are essential for optimization of electricity consumption by



Fig. 4. The screenshot of the application for manipulation the data from the bidirectional meter

#### IV. Software framework

The software framework is based on three functional segments:

- 1. Data acquisition –collecting data from various installations within PV system, such are invertors, bidirectional meter, meteorological unit, etc.
- 2. Data storage –in predefined time interval collected measurements are stored in the database (SQL Server 2012), creating a base for further analysis and report generation.
- 3. Data overview –allows the presentation of all data via web application that support the complete management of PV plant (reporting, alarms, georeferencing, etc.).

#### A. Data input from bidirectional meter

The software provided by the manufacturer is designed in a manner to generate daily XML files containing data from the bidirectional meter which are collected from parameter measuring sensors within scalable time interval. The software solution is implemented as a "Windows service" [9] that is constantly looping as the background activity. The application that was custom made is set to start up every hour and check for new records in the XML file and afterwards make a new entry in the database (Fig. 5).



### Fig. 5. Reading and storing measured data from bidirectional meter

#### B. Data overview

For using the collected data in most effective way, a subsystem has been developed for data representation. This reporting system was developed to satisfy the following needs:

- To represent the total amount of generated electric power;
- To make a correlation between meteorological conditions and generated electric power;
- To facilitate the analysis of electricity consumption and yield up to annual level;
- To allow the use of data for further system development

The system for data overview relies on SQL Server

Reporting Services. The advantages for using this technology are easy and fast generation of new reports and the online accessibility.

Based on aforementioned needs different categories of reports can be generated. First, the reports concerning the consumption allow tracking of consumption and defining patterns based on the data collected from bidirectional meter. This kind of reports provides the useful information on consumer behavior in using the electric energy from the grid, as well as when the needs are satisfied locally. Second, the reports on generated electricity give the information on generated electric energy from PV modules. The correlation reports are also possible where certain dependency can be determined between meteorological factors and produced electric energy, which allows further analysis and predictions.

However, the core of the monitoring system is the alarm management system which is defined through following steps:

- 1. Acquisition –the alarms are automatically collected from all devices or are generated by software on the server side, after certain conditions are met.
- 2. Analysis procedure –before sending, alarms are automatically filtered based on the category, priority level, and time, therefore allowing efficient system maintenance.
- Distribution –the information can be sent to multiple users, each with competence in specific field (technical, economical, etc.)
- 4. History –used for the analysis or representation of potential failures of PV plant components.

#### V. CONCLUSION

This paper presents the structure of the implemented PV system for production, transmission and consumption of electric energy in small-scale industrial facilities. Currently, the PV system is with the output power of 5 kW, but the production scaling in a function of consumption needs will be the subject of further research. The orientation toward the concept of net metering resulted with the integration of bidirectional meter and the development of software solution for data acquisition and storage.

By introducing the system for bidirectional metering and its further development, conditions have been met for the development of system hybrid management that allows optimization and adjustment of consumption in relation to optimal electric energy production from the PV modules within the small-scale industrial on-grid systems.

#### ACKNOWLEDGEMENT

The research presented in this paper is financed by the Ministry of Education, Science and Technological Development of the Republic of Serbia under the project TR33035.

#### References

- Marcel Šúri, et al., "Potential of solar electricity generation in the European Union member states and candidate countries", Solar Energy Vol. 81, 2007, pp. 1925.
- [2] Decker, B., Jahn, U., "Performance of 170 Grid connected PV Plants in Northern Germany-Analysis of Yields and Optimization Potentials", Solar Energy Vol. 59, Nos. 4-6, 1997, pp. 127-133.
- [3] Jung, H. S., Young, S. J., Gwon J.Y., Ju, Y. C., Jae H. C. "Performance results and analysis of 3 kW gridconnected PV systems", Renewable Energy, 32, 2007, pp. 1858–1872.
- [4] Sasitharanuwat, A., Rakwichian, W., Ketjoy N., Suponthana, W., "10 kW Multi Photovoltaic Cell Stand-alone/Grid Connected System for Office Building", 15th PVSEC, October 2005, Shanghai, China, Volume 1, pp. 638-639.
- [5] Sasitharanuwat, A., Rakwichian, W., Ketjoy N., Suponthana, W., "Designs and Testing of a 10 kWp Standalone PV Prototype for Future Community Grid Adapted for Remote Area in Thailand", International

Journal of Renewable Energy, Vol. 1, No. 2, July 2006, pp. 31-43.

- [6] Sasitharanuwat, A., Rakwichian, W., Ketjoy N., Ketjoy, N., "Performance evaluation of a 10 kWp PV power system prototype for isolated building in Thailand", Renewable Energy 07/2007; 32(8), pp.1288-1300.
- [7] A. Marcoci. A., S. Raffaelli. S., Sanjuan J. M. G., E. Cagno. E., Micheli G. J. L., G.Mauri. G., R.Urban. R., *"The Meter-ON project: how to support the deployment of Advanced Metering Infrastructures in Europe?*", C I R E D, 22nd International Conference on Electricity Distribution, Stockholm, 10-13 June 2013, CIRED2013 Session 6, Paper No. 1261.
- [8] Massachusetts Net Metering First Bill Walkthrough, National Grid Net Metering Webpage MA, <u>http://www.nationalgridus.com/masselectric/home/e</u> <u>nergyeff/4\_net-mtr.asp</u>, <u>http://www.nationalgridus.com/masselectric/busines</u> <u>s/energyeff/4 net-mtr.asp</u>,
- [9] Introduction to Windows Service Applications, <u>http://msdn.microsoft.com/en-</u>

us/library/d56de412%28v=vs.110%29.aspx

# Natural patterns for design and control of biwhegs in quadruped robot

Goran S. Đorđević, Miloš Petković, Darko Todorović

This paper was late for the printing in the proceedings. Do try the SSSS'14 DISC.

### Parallel Circuit Simulation on Graphical Processing Unit

Aram Baghdasaryan

Synopsys Armenia CJSC 3<sup>rd</sup> year PhD student, State Engineering University of Armenia Yerevan, Armenia aramb@synopsys.com

*Abstract*— So high integration of IC design and mix VLSI design have brought new complexity in IC design. This complexity brings new challenges for simulation IC time. There is interest to speed up Spice [1] simulation because for large IC simulation can take several days. Average 75% percent of simulation time is spent in evaluating transistor model equations. This report is discussing accelerating transistor model evaluation using Graphical Processing Unit (GPU). For speed up simulation time also used scheduling algorithms which help schedule tasks according to running time criteria. According to results method which is represented in this paper sped up simulation to 2.5 times.

*Keywords*—GPU, scheduling algorithms, simulation, speed up, parallel.

#### I. INTRODUCTION

Nature is analog and interaction with nature is also analog. Analog circuits are necessary where area, power, and high frequency operation can't be performed by digital circuits. Analog circuits are used in microprocessor supervisory circuits, massively parallel analog signal processors, switched-current filters and etc. There was a phenomenal growth of integrated circuits industry during last decades. In the middle of 60's appeared simple gates and operational amplifiers, in the 70's microprocessors and analog-to-digital conversion were discovered. Approximately 60% percent of CMOS, BiCMOS were mixed analog and digital parts. Analog design becomes part of the most digital circuits. Though high integration of IC design and mix VLSI designs have brought new complexity in IC design. The lack of analog circuit design formulation, circuit independent design procedure make analog design simulation complex and time consuming process. Simulation for large analog design can take several days. Though these growths in IC design bring new challenges to computer aid systems.

If simulation results don't satisfy specification, then simulation is repeated several times until it satisfies the specification. If after several iterations results don't satisfy specification, then designer should change circuit design and repeat same iterations as described above. This process is time consuming and can increase simulation time, that's why iterations are limited.

Laws for circuit theory (Kirchhoff's, Ohm lows) are not enough to design functional circuit. Analog circuit designer should also know other techniques, knowledge to design circuits. There are several approaches which help predict circuit behavior.

#### A. Analytic design equation

Simple analytic design equations predict sufficiently accurate circuit behavior. Many methods (small signal modeling, analysis method) were discovered for solving these complex equations without much loss of accuracy. Another approach is qualitative relationships base approach.

#### B. Qualitative relationships

Qualitative relationships between circuit performance and design variables help to understand circuit behavior. For example voltage gain of a CMOS amplifier depends on DC bias current's value and voltage gain of CMOS can be improved if DC bias current is reduced. This type of knowledge helps designers to choose appropriate values for variables which lead to circuit optimization.

#### II. SIMULATION

IC design complexity brings new challenges to computer aid systems. The first automation tools were optimization phase. These tools are limited due to several reasons.

- Good starting point. If designers specify bad starting point, it will bring bad circuit design. This issue overcomes with random starting points, but these methods are time consuming process.
- Circuits optimize knowledge. Designer has to define parameters which optimizer tool can change, in the other cases bad parameters can degrade optimization process.
- These systems are slow due to they involve simulation in the optimization loop.

Besides these limitations, CAD tools improve analog design in following ways.

- Reduce design time. This will help to enter market early.
- Make design process simple. It will allow to designers implement standard analog cells very quickly.

- Reduce probability of errors in design. Automation systems help to decrease the design cycle/success ratio.
- Improves manufacturing yield. Computer aid systems can improve manufacture yield and reduce profitability.
- Reduce production cost. This will help to reduce time for analog design which reduces production cost.

#### **III. PREVIOUS WORK**

IC design parallel simulation isn't new topic and there are many researches related to speeding up simulation time. An increasing number of elements in integral circuits (IC) bring new challenges for simulation tools. Nowadays simulation with Spice or with direct method simulators on scalar processor is a time consuming process. In circuit exist parallelism and it can be used to speed up simulation. There are two ways for reducing simulation time: develop effective algorithms or use more powerful systems.

#### A. Algorithms

Multilevel Newton algorithm and waveform relaxation algorithm are used for circuit decomposition [2]. According to decomposition circuit is divided into sub circuits (Fig. 1) and specification for one level should be satisfied by low level. Instead of satisfaction large of specification in top level which is difficult now in every stage design should satisfy sub specification.



Fig. 1. Sequential decomposition

Circuit is divided according to function and structure.

- Functional
- Structural

According to functional approach sub-circuit should correspond well defined function. Structural division related to input/output characteristics of circuit block. There are 4 types according to this declaration (Models, Building blocks, Task blocks and Primitive blocks).

Models are complex related to other blocks because they contain a significant number of devices. Examples of module are operational amplifiers, comparators, voltage references and etc. Building blocks perform simple function. They are involved in module. A typical example of building example is mirror which performs simple function of mirroring.

Here are the advantages of hierarchical approach.

- It helps to solve easy problems because difficult problems are divided to small and easier sub problems. It contains following approach "divide and conquer".
- It is possible to cover wide range of performance with hierarchical design approach, because single circuit can be added in different circuit design architectures.

The disadvantage of this algorithm is that a lot of feedbacks in circuits can increase simulation time.

#### B. Hardware

New shift in hardware design (multi-cores, cluster) brings new challenges to simulations tools. The most methods which perform IC simulation on clusters and supercomputers [3] not used shared memory and there are interconnections between processors which increase simulation time. In case of simulation on GPU [4] there is shared memory which helps to reduce simulation time.

#### **IV. SCHEDULING ALGORITHMS**

Scheduling necessity appears in multicore architecture when there are several tasks which are ready to be executed [5]. There are two possible situations for running tasks. Executed task can be displaced by other tasks or block other tasks until its completion. According to this approach there are exist two types of scheduling algorithms` Nonpreemptive and Displace.

According to non-preemptive scheduling, task can be executed as much as it requires for completion. Other tasks must wait and they can be executed when previous task completes or wait completion of input or output operations. This method is very simple, but there exist risk related to occupation processor when according to execution occurs error and current task can't give control to other tasks. According to displace scheduling, every task has the same execution time (quota). When execution time expires, task execution is interrupted and time quota assign to the next task. There is no risk of blocking task execution as in nonpreemptive scheduling algorithm. Here are widespread algorithms related to displace methodology.

#### A. First come first served.

The easiest scheduling algorithm is FCFS .When task is ready to be executed it is added in the end of the queue which contains list of ready tasks. Task is selected to run from begging of the queue. The advantage of this algorithm is that it easy to implement.

#### B. Round Robin (RR)

RR is a modification of scheduling algorithm FCFS. Difference from FCFS algorithm is that tasks, which are ready to be executed, are stored in the cycle queue. Every task has 10-100 ms execution time and when this time expires the next task start to run. There are two possible cases. 1. Execution time for task is less than time quota. In this case task will be removed before time quota expires and other task will be executed. 2. Execution time is bigger than time quota. In this case task running time will be equal to time quota. When according to big quota time majority tasks complete execution than RR algorithm execution time is equal to FCFS execution time. In case of little time quota in theory average waiting and running times are short, but in real systems switching time between tasks increases running time.

#### C. Shortest-Job-First (SJF)

FCFS, SJF algorithms performance depends on sequence of tasks. Algorithm performance will increase, when short execution task run at first. According to this criteria working algorithm called Shortest-Job-First (SJF). If short execution tasks are several, then their running sequence will be selected by FCFS algorithm. There are two types of SJF algorithm Non-preemptive and Displace. Task running process doesn't depend on which new tasks are generated according to this time in system in displace algorithm. According to displace scheduling when new tasks appears which execution time is smaller than running task execution time than algorithm displace running task and give processor resources to new task.

#### V. METHOD FORMULATION

In the Fig. 2 represented general structure of algorithm.



Fig. 2. General structure of the algorithm

At the first stage circuit is divided into sub circuits according to hierarchical approach. Parallel simulation scheduling and synchronization is implemented in master scheduler. Tasks scheduling is selected according to their execution times and depend on their execution times selected on of the scheduling algorithms. At the first all tasks are executed according to Shortest Job First algorithm. If several tasks execution time are equal, then these tasks are selected according to Round Robin algorithm. In the second stage tasks are chosen according to following approach. Task which contains many interaction is running on GPU due to it contains multiple threads and these interactions can be done parallel. Synchronization for running in GPU and CPU is done by the slave schedulers. In the Fig. 3 represented GPU general structure [6].



Fig. 3. General structure of algorithm

GPU has the following types of memory:

- Constant
- Texture
- Global
- Local

It consists of the following components:

- One set of register per processor.
- Share memory which is used by all processors
- Read only shared memory that is used by all processors which speed up reading operation from constant memory
- Read only shared memory that is used by all processors which speed up reading operation from texture memory

Global and locale memories aren't cash memory and they are used for reading, writing operations. Accessing to the local memory is faster than to the global memory, but the local memory is smaller compare to global memory. For reducing simulation it's preferable to store in local memory if the data size is small. A single floating point value to reading or writing from the global memory take 400 to 600 clock cycles. The latency is possible to reduce if there are instructions which can wait until global memory the end of reading or writing process. CUDA Programing method represented in the Fig. 4.When program is written in CUDA then computing device is GPU. It can execute large number of threads in parallel. In the thread code which was executed called kernel. GPU operates as co-processor for CPU. A thread block contains several threads which can be executed to run parallel which help reduce total simulation time. Every grid contains several blocks. This architecture allows running parallel maximum threads which reduce simulation time. Synchronization in the block is done in following way. All threads are suspended until they all reach synchronization point. Numbers of threads in each block are equal and block size decides programmer. Every thread has its number and it can be views in the code as 1, 2, and 3... dimension value. This method sped up simulation to 2.5 times.



Fig. 4. GPU programming model

In this paper transistor equation simulation is done on GPU which help to reduce simulation time due to CPU contains many parallel threads which help to do simulation in parallel. The most methods which perform IC simulation on clusters and supercomputers not used shared memory and there are interconnections between processors which increase simulation time. In case of simulation on GPU there is shared memory which helps to reduce simulation time.

Parallel simulation scheduling and synchronization is implemented by scheduling algorithms which help to choose best scheduling algorithm depending on task simulation time and sequences. According to this method simulation time is speed up to 2.5 times.

#### VI. EXPERIMENTAL RESULTS

The experiment environment included Intel 2.4GHz CPU, 4GB memory, NVIDIA 8800GTS display card, CUDA SDK2.0, Visual Studio 2005 C++ programming platform, and Windows 7 operating system. As the results are shown simulation time is speed up 2.5 times as shown in Table I.

 TABLE I

 SIMULATION TIMES FOR GPU AND CPU

| Trans  | Total                | CPU-alone  | GPU+CPU    | Speed up |
|--------|----------------------|------------|------------|----------|
| Number | Eval                 | simulation | simulation |          |
|        |                      | time (s)   | time (s)   |          |
| 300    | $1.4 \text{x} 10^7$  | 44         | 33         | 1.3      |
| 1200   | $2.3 \times 10^7$    | 102        | 41         | 2.5      |
| 1100   | $4.5 \times 10^8$    | 550        | 230        | 2.4      |
| 510    | $1.7 \text{x} 10^7$  | 27         | 20         | 1.35     |
| 1000   | $5.8 \times 10^7$    | 132        | 74         | 1.78     |
| 2020   | 1.95x10 <sup>8</sup> | 490        | 220        | 2.2      |
| 3100   | $1.3 \times 10^7$    | 452        | 235        | 1.92     |

#### VII. CONCLUSION

In this paper represented new parallel simulation method, according which simulation time is speed up to 2.5 times. The method is implemented by slave and master schedulers. At the first stage circuit is divided into sub circuits according hierarchical approach. Parallel simulation scheduling and synchronization is implemented in master scheduler. Task which contains many interaction is running on GPU due to it contains multiple threads and these interactions can be done parallel. Synchronization for running in GPU and CPU is done by the slave schedulers. This help to speed up simulation.

#### REFERENCES

- [1] Nagel, L., "SPICE: A computer program to simulate computer circuits," in University of California, Berkeley UCB/ERL Memo M520, May., 1995.
- [2] Chen, R., "Solution of Large-scale Circuits by Partitioning, Proc. IEEE TENCONI82", Hong Kong, Dec., 1982, pp.71-77.
- [3] Parkhurst, J., *"From single core to multi-core: preparing for a new exponential"* International conference on Computer Aided Design, Nov., 2006.
- [4] Owens, J., "GPU architecture overview," in SIGGRAPH '07: ACM SIGGRAPH 2007 courses, (New York, NY, USA), 2007, p. 2.
- [5] Chen, X., Wu, W., Wang, Y.,Yu, H., Yang, H., "An scheduler based data dependence analysis and task scheduling for parallel circuit simulation" Circuits and Systems II: Express Briefs, IEEE Transactions on, Vol. 58, No. 10, Oct., 2011, pp. 702 –706.
- [6] Luebke, D., Harris, M., Govindaraju, N., Lefohn, A., Houston, M., Owens, J., Segal, M., Papakipos, M., Buck, I., "GPGPU: general-purpose computation on graphics hardware" in SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, (New York, NY, USA), p. 208.

## Synthesis of application specific processor architectures for ultra-low energy consumption

Tom J. Kazmierski and Charles Leech

Electronics and Computer Science, Faculty of Physical Sciences and Engineering University of Southampton, SO17 1BJ, UK {tjk,cl19g10}@ecs.soton.ac.uk

Abstract—In this paper we suggest that further energy savings can be achieved by a new approach to synthesis of embedded processor cores, where the architecture is tailored to the algorithms that the core executes. In the context of embedded processor synthesis, both single-core and many-core, the types of algorithms and demands on the execution efficiency are usually known at the chip design time. This knowledge can be utilised at the design stage to synthesise architectures optimised for energy consumption. Firstly, we present an overview of both traditional energy saving techniques and new developments in architectural approaches to energy-efficient processing. Secondly, we propose a picoMIPS architecture that serves as an architectural template for energy-efficient synthesis. As a case study, we show how the picoMIPS architecture can be tailored to an energy efficient execution of the DCT algorithm.

#### I. INTRODUCTION

Much research has been recently devoted to the development of energy efficient technologies in single-core and many-core processor systems leading to further savings in power consumption. Both traditional power saving techniques as well as novel architectures, including heterogeneous many-core architectures and reconfigurable architectures have been developed. The new research has been stimulated largely by the fact that the introduction of multi-core structures to processor architectures caused a significant increase in the power consumption of these systems. In addition, the gap between the average power and peak power has widened as the level of core integration increases [1].

Many energy efficiency and power saving technologies are already integrated into processor architectures in order to reduce power dissipation and extend battery life, especially in mobile devices. A combination of technologies is most commonly implemented to achieve the best energy efficiency whilst still allowing the system to meet performance targets [2]. Techniques to increase energy efficiency can be applied at many development levels from architecture co-design and code compilation to task scheduling, run-time management and application design [3]. Traditional techniques include Dynamic Voltage and Frequency Scaling (DVFS), clock gating and clock distribution and power domains. DVFS is a technique used to control the power consumption of a processor through fine adjustment of the clock frequency and supply voltage levels [1][2][3][4]. High levels are used when meeting performance targets is a priority and low levels (known as CPU throttling) are used when energy efficiency is most important or high performance is not required. When the supply voltage is lowered and the frequency reduced, the execution of instructions by the processor is slower but performed more energy efficiently due to the extension of delays in the pipeline stages.

Further savings are achieved by the use of power domains, where regions of a system or a processor that are controlled from a single supply can be completely powered down in order to minimise power consumption without entirely removing the power supply to the system. Power domains can be used dynamically and in conjunction with clock gating. The ARM Cortex-A15 MPCore processor supports multiple power domains both for the core and for the surrounding logic [6]. Figure 1 shows these domains, labelled Processor and Non-Processor, that allow large parts of the processor to be deactivated. Smaller internal domains, such as CK\_GCLKCR, are implemented to allow smaller sections to be deactivated for finer performance and power variations.

Modelling and simulation of many-core processors is also an important area as it allows to understand better the complex interactions that occur inside a system and cause power and energy consumption [9], [10], [11], [12], [13]. For example, the model created by Basmadjian et al. [10] is tailored for many-core architectures in that it accounts for resource sharing and power saving mechanisms.

In this paper we suggest that further energy savings can be achieved by a new approach to synthesis of embedded processor cores, where the architecture is tailored to the algorithms that the core executes. In the context of embedded processor synthesis, both single-core and many-core, the types of algorithms and demands on the execution efficiency are usually known at the chip design time. This knowledge can be utilised at the design stage. As a case study, we propose in section III a picoMIPS architecture that can be tailored to an energy efficient execution of the DCT algorithm.



Fig. 1: The ARM Cortex-A15 features multiple power domains for the core and surrounding logic, reprinted from [6].

#### II. RECENT DEVELOPMENTS IN ENERGY EFFICIENT ARCHITECTURES

#### A. Pipeline Balancing

Pipeline balancing (PLB) is now an established technique used to dynamically adjust the resources of the pipeline of a processor such that it retains performance while reducing power consumption [14]. Power balanced pipelines is a concept in which the power disparity of pipeline stages is reduced by assigning different delays to each microarchitectural pipestage while guaranteeing a certain level of performance/throughput ratio [15]. Static power balancing is performed during design time to identify power heavy circuitry in pipestages for which consumption remains fairly constant for different programs and reallocate cycle time accordingly. Dynamic power balancing is implemented on top of this to manage power fluctuations within each workload and further reduce the total power cost. Power savings are also greater at lower frequencies. The delay constraints on microarchitectural pipeline stages can be modified in order to make them more power efficient, in a similar way to DVFS, when the performance demand of the application is relaxed [15]. PLB can also operate in response to instruction per cycle (IPC) variations within a program [14]. Here the PLB mechanism dynamically reduces the issue width of the pipeline to save power or increases it to boost throughput.

#### B. Caches and Interconnects

It is not only the design of the processor's internal circuitry that is important in maintaining energy efficiency. Careful co-design of the interconnect, caches and the processor cores is required to achieve high performance and energy efficiency [16]. High level of integration that is inherent in multiple-processor systems can be utilised to educe the interconnect power consumption by improving cache coherence protocols [17]. An average of 16.3% of L2 cache accesses could be optimised and as every access consumes time and power, an average 9.3% power reduction is recorded while increasing system performance by 1.4% [17]. Recently a new methodology has been proposed [10] for estimating the power consumption of multicore processors. It takes into account resource sharing and power saving mechanism on top of the power consumption of each core.

#### C. Energy Efficiency techniques in Heterogeneous Multicore Architectures

A heterogeneous or asymmetric multi-core architecture is composed of cores of varying size and complexity which are designed to complement each other in terms of performance and energy efficiency [8]. A typical system will implement a small core to process simple tasks, in an energy efficient way, while a larger core provides higher performance processing for when computationally demanding tasks are presented. The cores represent different points in the power/performance design space and significant energy efficiency benefits can be achieved by dynamically allocating application execution to the most appropriate core [18]. A task matching or switching system is also implemented to intelligently assign tasks to cores; balancing a performance demand against maintaining system energy efficiency. These systems are particularly good at saving power whilst handling a diverse workload where fluctuations of high and low computational demand are common [19].

A heterogeneous architecture can be created in many different ways and many alternative have been developed due to the heavy research interest in this area. Modifications to general purpose processors, such as asymmetric core sizes [13], custom accelerators [20], varied caches sizes [21] and heterogeneity within each core [22][7], have all been demonstrated to introduce heterogeneous features into a system.

One of the most prominent and successful heterogeneous architectures to date is the ARM big.LITTLE system. This is a production example of a heterogeneous multiprocessor system consisting of a compact and energy efficient "LITTLE" Cortex-A7 processor coupled with a higher performance "big" Cortex-A15 processor [19]. The system is designed with the dynamic usage patterns of modern smart phones in mind where there are typically periods of high intensity processing followed by longer periods of low intensity processing [23]. Low intensity tasks, such as texting and audio, can be handled by the A7 processor enabling a mobile device to save battery life. When a period of high intensity occurs, the A15 processor can be activated to increase the system's throughput and meet tighter performance deadlines. A power saving of up to 70% is advertised for a light workload, where the A7 processor can handle all of the tasks, and a 50% saving for medium workloads where some tasks will require allocation to the A15 processor.

Kumar et al present an alternative implementation where two architectures from the Alpha family, the EV5 and EV6, are combined to be more energy and area efficient than a homogeneous equivalent [8][18]. They demonstrate that a much higher throughput can be achieved due to the ability of a heterogeneous multi-core architecture to better exploit changes in thread-level parallelism as well as inter- and intra- thread diversity [8]. In [18], they evaluate the system in terms of its power efficiency indicating a 39% average energy reduction for only a 3% performance drop [18].

Composite Cores is a microarchitectural design that reduces the migration overhead of task switching by bringing heterogeneity inside each individual core [22]. The design contains 2 separate backend modules, called  $\mu$ Engines, one of which features a deeper and more complex out-of-order pipeline, tailored for higher performance, while the other features a smaller, compact in-order pipeline designed with energy efficiency in mind. Figure Due to the high level of



Fig. 2: The microarchitecture for Composite Cores, featuring two  $\mu$ Engines, reprinted from [22].

hardware resource sharing and the small  $\mu$ Engine state, the migration overhead is brought down from the order of 20,000 instructions to 2000 instructions. This greatly reduces the energy expenditure associated with migration and also allows more of the task to be run in an efficient mode. Their results show that the system can achieve an energy saving of 18% using dynamic task migration whilst only suffering a 5% performance loss.

Using both a heterogeneous architecture and hardware reconfiguration, a technique called Dynamic Core Morphing (DCM) is developed by Rodrigues et al to allow the shared hardware of a few tightly coupled cores to be morphed at run-time [7]. The cores all feature a baseline configuration but reconfiguration can trigger the re-assignment of high performance functional units to different cores to speed up execution. The efficiency of the system can lead to performance per watt gains of up to 43% and an average saving of 16% compared to a homogeneous static architecture.

The energy efficiency benefits of heterogeneity can only be exploited with the correct assignment of tasks or applications to each core [9] [24][25][26][12]. Tasks must be assigned in order to maximise energy efficiency whilst ensuring performance deadlines are met. Awan et al perform scheduling in two phases to improve energy efficiency; task allocation to minimise active energy consumption and exchange of higher energy states to lower, more energy efficient sleep states [9]. Alternatively, Calcado et al propose division of tasks into m-threads to introduce fine-grain parallelism below thread level [27]. Moreover, Saha et al include power and temperature models into an adaptive task partitioning mechanism in order to allocate task according to their actual utilisations rather than based on a worst case execution time [12]. Simulation results confirm that the mechanism is effective in minimising energy consumption by 55% and reduces task migrations by 60% over alternative task partitioning schemes.

Tasks assignment can also be performed in response to program phases which naturally occur during execution when the resource demands of the application change. Phase detection is used by Jooya and Analoui to dynamically re-assigning programs for each phase to improve the performance and power dissipation of heterogeneous multi-core processors [25]. Programs are profiled in dynamic time intervals in order to detect phase changes. Sawalha et al also propose an online scheduling technique that dynamically adjusts the program-to-core assignment as application behaviour changes between phases with an aim to maximise energy efficiency [26]. Simulated evaluation of the scheduler shows energy saving of 16% on average and up to 29% reductions in energy-delay product can be achieved as compared to static assignments.

#### D. Energy Efficiency techniques in Reconfigurable Multicore Architectures

Reconfigurability is another property that has the potential to increase the energy and area efficiency of processors and systems on chip by introducing adaptability and hardware flexibility into the architecture. Building on the innovations that heterogeneous architectures bring, reconfigurable architectures aim to achieve both energy efficiency and high performance but within the same processor and therefore meet the requirements of many embedded systems. The flexible heterogeneous Multi-Core processor (FMC) is an example of the fusion of these two architectures that can deliver both a high throughput for uniform parallel applications and high performance for fluctuating general purpose workloads [28]. Reconfigurable architectures are dynamic, adjusting their complexity, speed and performance level in response to the currently executing application. With this property in mind, we disregard systems that are statically reconfigurable but fixed while operating, such as traditional FPGAs, considering only architectures that are run-time reconfigurable.

#### E. Dynamic Partial Reconfiguration

FPGA manufacturers such as Xilinx and Altera now offer a mechanism called Dynamic Partial Reconfiguration (DPR) [29] or Self-Reconfiguration (DPSR) [30] to enable reconfiguration during run-time of the circuits within an FPGA, allowing a region of the design to change dynamically while other areas remain active [31]. The FPGA's architecture is partitioned into a static region consisting of fixed logic, control circuits and an embedded processor that control and monitor the system. The rest of the design space is allocated to a dynamic/reconfigurable region containing a reconfigurable logic fabric that can be formed into any circuit whenever hardware acceleration is required.

PDR/PDSR presents energy efficiency opportunities over fixed architectures. PDR enables the system to react dynamically to changes in the structure or performance and power constraints of the application, allowing it to address inefficiencies in the allocation of resources and more accurately implement changing software routines as dynamic hardware accelerators [29]. These circuits can then be easily removed or gated when they are no longer required to reduce power consumption [32]. PDR can also increase the performance of an FPGA based system because it permits the continued operation of portions of the dynamic region unaffected by reconfiguration tasks. Therefore, it allows multiple applications to be run in parallel on a single FPGA [30]. This property also improves the hardware efficiency of the system as, where separate devices were required, different tasks can now be implemented on a single FPGA, reducing power consumption and board dimensions. In addition, PDR reduces reconfiguration times due to the fact that only small modification are made to the bitstream over time and the entire design does not need to be reloaded for each change.

A study into the power consumption patterns of DPSR programming was conducted by Bonamy et al<sup>[11]</sup> to investigate to what degree the sharing of silicon area between multiple accelerators will help to reduce power consumption. However, many parameters must be considered to assess whether the performance improvement outweighs preventative factors such as reconfiguration overhead, accelerator area and idle power consumption and as such any gain can be difficult to evaluate. Their results show complex variations in power usage at different stages during reconfiguration that is dependent on factors like the previous configuration and the contents of the configured circuit. In response to these experiments, three power models are proposed to help analyse the tradeoff between implementing tasks as dynamically reconfigurable, in static configuration or in full software execution.

Despite clear benefits, several disadvantages become apparent with this form of reconfigurable technology. As was shown above, the power consumption overhead associated with programming new circuits can effectively imposed a minimum size or usage time on circuits for implementation to be validated. In addition, a baseline power and area cost is also always created due to the large static region which continuously consumes power and can contain unnecessary hardware. Finally, the FPGA interconnect reduces the speed and increases the power consumption of the circuit compared to an ASIC implementation because of an increased gate count required to give the system flexibility.

#### F. Composable and Partitionable Architectures

Partitioning and composition are techniques employed by some dynamically reconfigurable systems to provide adaptive parallel granularity [33]. Composition involves synthesising a larger logical processor from smaller processing elements when higher performance computation or greater instruction or thread level parallelism (ILP or TLP) is required. Partitioning on the other hand will divide up a large design in the most appropriate way and assign shared hardware resources to individual cores to meet the needs of an application.

Composable Lightweight Processors (CLP) is an example of a flexible architectural approach to designing a Chip Multiprocessor (CMP) where low-power processor cores can be aggregated together dynamically to form larger single-threaded processors [33]. The system has an advantage over other reconfigurable techniques in that there are no monolithic structure spanning the cores which instead communicate using a microarchitectural protocol. In tests against a fixed-granularity processor, the CLP has been shown to provide a 42% performance improvement whilst being on average 3.4 times as area efficient and 2 times as power efficient.

Core Fusion is a similar technique to CLP in that it allows multiple processors to be dynamically allocated to a single instruction window and operated as if there were one larger processor [34]. The main difference from CLP is that Core Fusion operates on conventional RISC or CISC ISAs giving it an advantage over CLP in terms of compatibility. However, this also requires that the standard structures in these ISAs are present and so can limit the scalability of the architecture.

#### G. Coarse Grained Reconfigurable Array Architectures

Coarse-Grained Reconfigurable Array (CGRA) architectures represent an important class of programmable system that act as an intermediate state between fixed general purpose processors and fine-grain reconfigurable FPGAs. They are designed to be reconfigurable at a module or block level rather than at the gate level in order to trade-off flexibility for reduced reconfiguration time [35].

One example of a CGRA designed with energy efficiency as the priority is the Ultra Low Power Samsung Reconfigurable Processor (ULP-SRP) presented by Changmoo et al [36]. Intended for biomedical applications as a mobile healthcare solution, the ULP-SRP is a variation of the ADRES processor [37] and uses 3 run-time switch-able power modes and automatic power gating to optimise the energy consumption of the device. Experimental results when running a low power monitoring application show a 46.1% energy consumption reduction compared to previous works.

#### III. CASE STUDY - PICOMIPS

The picoMIPS architecture proposed here is a RISC microprocessor with a minimised instruction set architecture (ISA). Each implementation will contain only the necessary datapath elements in order to maximise area efficiency as the priority. For example, the instruction decoder will only recognise instructions that the user specifies and the ALU will only perform the required logic or arithmetic functions. Due to the correlation between logic gate count and power consumption, energy efficiency is also maximised in the processor therefore the system is

designed to perform a specific task in the most efficient processor-based form.

By synthesising the picoMIPS as a microprocessor, a baseline configuration is established upon which functionality can be added or removed, in the form of instructions or functions, while incurring only minimal changes to the area consumption of the design. If the task was implemented as a specific dedicated hardware circuit, any changes to the functionality could have a large influence on the area consumption of the design. Figure 3 shows an example configuration for the picoMIPS which can accommodate the majority of the simple RISC instructions. It is a Harvard architecture, with separate program and data memories, although the designer may choose to exclude a data memory entirely. The user can also specify the widths of each data bus to avoid unnecessary opcode bits from wasting logic gates.

The picoMIPS has also been implemented to perform the DCT and inverse DCT (IDCT) in a multi-core context [38]. A homogeneous architecture was deployed with the same single core structure, as in figure 3, being replicated 3 times. The cores are connected via a data bus to a distribution module as shown in figure 4 where block data is transferred to each core in turn. This structure theoretically triples the throughput of the system as it can process multiple data blocks in parallel.

As a microprocessor architecture, the picoMIPS can implement many of the technologies discussed in the Introduction to improve energy efficiency. Clock gating, power domains and DVFS will all benefit the system however the area overhead of implementing them must first be considered as necessary. Pipeline balancing and caching can be integrated into more complex picoMIPS architectures however these are performance focused improvements and so are not priorities in the picoMIPS concept. The expansion of the system to multi-core is also one that can be employed to improve performance. Moreover, a heterogeneous architecture could be implemented to allow the picoMIPS to process multiple different applications simultaneously using several tailored ISAs. Reconfigurability can also be applied to picoMIPS to create an architecture which can be specific to each application that is executed, effectively creating a general purpose yet application specific processor. This property would require run-time synthesis algorithms to detect and develop the instructions and functional units that are required, before executing the application.

#### IV. CONCLUSION

The principles of the picoMIPS processor have been implemented in a few undergraduate projects to demonstrate the concept of minimal architecture synthesis and how it can be used to produce an application specific, energy efficiency processor. A number of examples were used to demonstrate the validity of this approach in both, singlecore and many-core designs. In addition to the discrete



Fig. 3: An example implementation of the picoMIPS architecture.



Fig. 4: A Multi-core implementation of the picoMIPS architecture.

cosine transform (DCT) algorithm presented above, a stage in JPEG compression was synthesised for FPGA implementation into a processor architecture based on the picoMIPS concept, as well as various image manipulation algorithms. Evaluation of results from this work still continues but it is evident that resulting processors are more area efficient than corresponding FPGA soft-cores or a GPP due to the removal of unnecessary circuitry. Such synthesised processors can also be compared to a dedicated ASIC hardware implementation. An ASIC implementations are likely to have a much higher performance and throughput of data however this is at the cost of area and energy efficiency. The picoMIPS therefore represents a balance between the two, sacrificing some performance for area and energy efficiency benefits.

#### References

- C. Isci, A. Buyuktosunoglu, C.-Y. Chen, P. Bose, and M. Martonosi, "An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget," in *Microarchitecture*, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, 2006, pp. 347–358.
- [2] V. Hanumaiah and S. Vrudhula, "Energy-efficient Operation of Multi-core Processors by DVFS, Task Migration and Active Cooling," *Computers, IEEE Transactions on*, vol. PP, no. 99, pp. 1–1, 2012.
- [3] B. de Abreu Silva and V. Bonato, "Power/performance optimization in FPGA-based asymmetric multi-core systems," in Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on, 2012, pp. 473–474.
- [4] K. Wonyoung, M. Gupta, G.-Y. Wei, and D. Brooks, "System level analysis of fast, per-core DVFS using on-chip switching regulators," in *High Performance Computer Architecture*, 2008. *HPCA 2008. IEEE 14th International Symposium on*, 2008, pp. 123–134.

- [5] P. Bassett and M. Saint-Laurent, "Energy efficient design techniques for a digital signal processor," in *IC Design Technology* (*ICICDT*), 2012 IEEE International Conference on, 2012, pp. 1–4.
- [6] ARM, ARM Cortex-A15 MPCore Processor Technical Reference Manual, ARM, June 2013, pages 53 - 63. [Online]. Available: http://infocenter.arm.com/help/topic/com.arm.doc. ddi0438i/DDI0438I\_cortex\_a15\_r4p0\_trm.pdf
- [7] R. Rodrigues, A. Annamalai, I. Koren, S. Kundu, and O. Khan, "Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores," in *Parallel Architectures and Compilation Techniques (PACT)*, 2011 International Conference on, 2011, pp. 121–130.
- [8] R. Kumar, D. Tullsen, P. Ranganathan, N. Jouppi, and K. Farkas, "Single-ISA heterogeneous multi-core architectures for multithreaded workload performance," in *Computer Architecture*, 2004. Proceedings. 31st Annual International Symposium on, 2004, pp. 64–75.
- [9] M. Awan and S. Petters, "Energy-aware partitioning of tasks onto a heterogeneous multi-core platform," in *Real-Time and Embedded Technology and Applications Symposium (RTAS)*, 2013 IEEE 19th, 2013, pp. 205–214.
- [10] R. Basmadjian and H. De Meer, "Evaluating and modeling power consumption of multi-core processors," in *Future Energy* Systems: Where Energy, Computing and Communication Meet (e-Energy), 2012 Third International Conference on, 2012, pp. 1–10. [Online]. Available: http://ieeexplore.ieee.org/xpl/ articleDetails.jsp?arnumber=6221107
- [11] R. Bonamy, D. Chillet, S. Bilavarn, and O. Sentieys, "Power consumption model for partial and dynamic reconfiguration," in *Reconfigurable Computing and FPGAs (ReConFig)*, 2012 International Conference on, 2012, pp. 1–8.
- [12] S. Saha, J. Deogun, and Y. Lu, "Adaptive energy-efficient task partitioning for heterogeneous multi-core multiprocessor realtime systems," in *High Performance Computing and Simulation* (HPCS), 2012 International Conference on, 2012, pp. 147–153.
- [13] D. . Woo and H.-H. Lee, "Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era," *Computer*, vol. 41, no. 12, pp. 24–31, 2008.
- [14] R. Bahar and S. Manne, "Power and energy reduction via pipeline balancing," in *Computer Architecture*, 2001. Proceedings. 28th Annual International Symposium on, 2001, pp. 218–229.
- [15] J. Sartori, B. Ahrens, and R. Kumar, "Power balanced pipelines," in *High Performance Computer Architecture* (HPCA), 2012 IEEE 18th International Symposium on, 2012, pp. 1–12.
- [16] R. Kumar, V. Zyuban, and D. Tullsen, "Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling," in *Computer Architecture*, 2005. ISCA '05. Proceedings. 32nd International Symposium on, 2005, pp. 408–419.
- [17] H. Zeng, J. Wang, G. Zhang, and W. Hu, "An interconnectaware power efficient cache coherence protocol for CMPs," in *Parallel and Distributed Processing*, 2008. IPDPS 2008. IEEE International Symposium on, 2008, pp. 1–11.
- [18] R. Kumar, K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen, "Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction," in *Microarchitecture*, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, 2003, pp. 81–92.
- [19] P. Greenhalgh, "big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7," ARM, Tech. Rep., September 2011.
- [20] H. M. Waidyasooriya, Y. Takei, M. Hariyama, and M. Kameyama, "FPGA implementation of heterogeneous multicore platform with SIMD/MIMD custom accelerators," in *Circuits and Systems (ISCAS)*, 2012 IEEE International Symposium on, 2012, pp. 1339–1342.
- [21] B. de Abreu Silva, L. Cuminato, and V. Bonato, "Reducing the overall cache miss rate using different cache sizes for Heterogeneous Multi-core Processors," in *Reconfigurable Computing and FPGAs (ReConFig), 2012 International Conference on,* 2012, pp. 1–6.
- [22] A. Lukefahr, S. Padmanabha, R. Das, F. Sleiman, R. Dreslinski, T. Wenisch, and S. Mahlke, "Composite Cores: Pushing Het-

erogeneity Into a Core," in *Microarchitecture (MICRO), 2012* 45th Annual IEEE/ACM International Symposium on, 2012, pp. 317–328.

- [23] B. Jeff, "Advances in big.LITTLE Technology for Power and Energy Savings," ARM, Tech. Rep., September 2012.
- [24] S. Zhang and K. Chatha, "Automated techniques for energy efficient scheduling on homogeneous and heterogeneous chip multi-processor architectures," in *Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific*, 2008, pp. 61–66.
- [25] A. Z. Jooya and M. Analoui, "Program phase detection in heterogeneous multi-core processors," in *Computer Conference*, 2009. CSICC 2009. 14th International CSI, 2009, pp. 219–224.
- [26] L. Sawalha and R. Barnes, "Energy-Efficient Phase-Aware Scheduling for Heterogeneous Multicore Processors," in *Green Technologies Conference*, 2012 IEEE, 2012, pp. 1–6.
- [27] F. Calcado, S. Louise, V. David, and A. Merigot, "Efficient Use of Processing Cores on Heterogeneous Multicore Architecture," in Complex, Intelligent and Software Intensive Systems, 2009. CISIS '09. International Conference on, 2009, pp. 669–674.
- [28] M. Pericas, A. Cristal, F. Cazorla, R. Gonzalez, D. Jimenez, and M. Valero, "A Flexible Heterogeneous Multi-Core Architecture," in *Parallel Architecture and Compilation Techniques*, 2007. PACT 2007. 16th International Conference on, 2007, pp. 13–24.
- [29] M. Santambrogio, "From Reconfigurable Architectures to Self-Adaptive Autonomic Systems," in Computational Science and Engineering, 2009. CSE '09. International Conference on, vol. 2, 2009, pp. 926–931.
- [30] J. Zalke and S. Pandey, "Dynamic Partial Reconfigurable Embedded System to Achieve Hardware Flexibility Using 8051 Based RTOS on Xilinx FPGA," in Advances in Computing, Control, Telecommunication Technologies, 2009. ACT '09. International Conference on, 2009, pp. 684–686.
- [31] S. Bhandari, S. Subbaraman, S. Pujari, F. Cancare, F. Bruschi, M. Santambrogio, and P. Grassi, "High Speed Dynamic Partial Reconfiguration for Real Time Multimedia Signal Processing," in *Digital System Design (DSD)*, 2012 15th Euromicro Conference on, 2012, pp. 319–326.
- [32] S. Liu, R. Pittman, A. Forin, and J.-L. Gaudiot, "On energy efficiency of reconfigurable systems with run-time partial reconfiguration," in *Application-specific Systems Architectures and Processors (ASAP)*, 2010 21st IEEE International Conference on, 2010, pp. 265–272.
- [33] K. Changkyu, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. Keckler, "Composable Lightweight Processors," in *Microarchitecture*, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on, 2007, pp. 381–394.
- [34] E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez, "Core fusion: accommodating software diversity in chip multiprocessors," in *Proceedings of the 34th annual* international symposium on Computer architecture, ser. ISCA '07. New York, NY, USA: ACM, 2007, pp. 186–197. [Online]. Available: http://doi.acm.org/10.1145/1250662.1250686
- [35] Z. Rakossy, T. Naphade, and A. Chattopadhyay, "Design and analysis of layered coarse-grained reconfigurable architecture," in *Reconfigurable Computing and FPGAs (ReConFig)*, 2012 International Conference on, 2012, pp. 1–6.
- [36] K. Changmoo, C. Mookyoung, C. Yeongon, M. Konijnenburg, R. Soojung, and K. Jeongwook, "ULP-SRP: Ultra low power Samsung Reconfigurable Processor for biomedical applications," in *Field-Programmable Technology (FPT)*, 2012 International Conference on, 2012, pp. 329–334.
- [37] F. J. Veredas, M. Scheppler, W. Moffat, and B. Mei, "Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes," in *Field Programmable Logic and Applications*, 2005. International Conference on, 2005, pp. 106–111.
- [38] G. Liu, "Fpga implementation of 2d-dct/idct algorithm using multi-core picomips," Master's thesis, University of Southampton, School of Electronics and Computer Science, September 2013.

### New Fault Tolerant Design Methodology Applied to Middleware Switch Processor

Vladimir Petrovic, Marko Ilic, Gunter Schoof and Zoran Stamenkovic

*Abstract* - In this paper is presented a new fault tolerant design methodology which provides protection against three most important radiation effects – single event transients (SET), single event upsets (SEU) and single event latchup (SEL). SETs and SEUs are mitigated using the hardware redundancy. Protection against SEL effects is provided by specially designed SEL power protection cell. Combination of different protection techniques is the basis for new fault tolerant design methodology. Middleware Switch processor, which is the main part of Spacecraft Area Network is implemented using the presented design methodology and implementation characteristics are discussed.

*Keywords* – fault tolerant design, ASIC, single event effects, radiation effects, redundancy, design methodology.

#### I. INTRODUCTION

The main requirement of the space and safety-critical systems is high reliability. In the environments where is hard or even impossible to provide maintenance (space and military applications), it is very important to deploy the circuits and systems which can tolerate faults. Practically, almost all SEU and SET fault-tolerant techniques are based on the redundancy. Few most used types of the redundancy are listed below:

- Hardware redundancy
- Information redundancy
- Time redundancy
- Software redundancy.

In this paper, only the hardware redundancy will be discussed. As the technology allows for smaller transistors, the digital cells get smaller and, therefore, the hardware redundancy becomes more popular. The hardware redundancy provides masking of faults and protects the circuit (or system) from failure. Common hardwareredundancy techniques are the triple modular redundancy (TMR) and the double modular redundancy (DMR).

The triple modular redundancy was mentioned for the first time in the literature in 1956 by J. Von Neumann [1]. The redundant circuit consists of three identical modules and a 3-input majority voter (Fig. 1.a). The voter's function is to pass through the major input value to the output. As we speak about digital circuits, the modules are memory elements such as flip-flops or latches. The main disadvantage of this technique is that the system fails in

Vladimir Petrovic, Marko Ilic, Gunter Schoof and Zoran Stamenkovic are with the Innovations for High Performance, IHP GmbH, Im Technologiepark 25, 15236 Frankfurt Oder, Germany, E-mail: {petrovic, schoof, stamenkovic}@ihpmicroelectronics.com; markoilic2211@gmail.com case of a faulty voter. Therefore, a new triple voting logic was developed to complete the circuit redundancy (Fig. 1.b). Each of the three voters is fed from outputs of all three memory modules. This technique is known in the literature as the full triple modular redundancy. A detailed analysis of the triple modular redundancy is presented in [2].



a) single voting, b) full voting

In order to reduce the high hardware overhead produced by the TMR [3], [4] and keep the design reliability high, implementation can be performed using the double modular redundancy with self-voting [5]. A DMR circuit can be designed in two versions: single-voter version and double-voter version. Both versions are shown in Fig. 2. In the literature is possible to find a "C element" as selfvoting structure.



a) single voting, b) full voting

Regarding latchup effects (SEL – single event latchup), all the known techniques for latchup mitigation are classified in three main groups, which are discussed below.

First group uses the current sensors at the board level to detect the excessive current induced by the latchup. The power supply of the affected device is switched-off and, after a pre-specified (long enough) period of time, reestablished again. This approach suffers from a serious drawback: the circuit state is destroyed and cannot be recovered. In addition, the board protection circuits must be designed with special care and the following requirements have to be met:

Proper decoupling of ICs,

- Clamping of outputs with diodes when driving inductive loads,
- Clamping of inputs with diodes if the input signal exceeds the power supply voltage,
- Use of star grounds in high-current applications.

Second latchup effect mitigation approach [6] is based on introduction of an epitaxial-buried layer process and reduction of the well resistivity. However, this modification incurs additional costs and may impact circuit performance (the breakdown voltage, for example).

Third latchup effect mitigation approach [7] uses guard rings (additional N-type and P-type regions) that break the parasitic bipolar transistor structure. This solution is very efficient but can result in excessive circuit area and, therefore, price.

In order to have an automated design flow for the faulttolerant circuits, it is essential to design the specific components which are not present in standard or radiation hardened design kits. Each component, described in this paper, provides a protection related to the particular effect. A circuit for the latchup protection is described first. Details of the redundant circuits with separated power domains are presented in the following section. Fault tolerant technique is applied on the Spacecraft Area Network (SCAN) main processor, which is in literature known as Middleware Switch processor [8][9].

#### II. INTEGRATED LATCHUP PROTECTION

Based on the Latchup Protection Technology (LPT) [10] is developed a circuit which can be integrated in an ASIC as power control cell. The idea is to control the current flow of smaller standard cell areas with integrated latchup protection circuits, instead of the nowadays external LPT based protection circuits. The most interesting advantage is a combination of the redundant circuits, used for protection against upsets and transients, and High-Current-Flow protection circuits, used for protection against potential destructive latchup effects. Redundancy provides stable circuit states during latchup protection phase [11]. With LPT technology this was not the case. Therefore, the paper presents the approach which provides the protection against upsets, transients and latchup effects without expensive technology changes.

Block-diagram of SPS cell is represented in Fig. 3. It consists of current-flow sensor/driver, feed-back block, control block and communication interface for the SPS power network controller (PNC).

The most important component in the SPS circuit is a current sensor. It is in the same time a current driver, used to provide the power supply for the logic which needs to be protected against latchup effect. It is basically a PMOS transistor with wide channel, used in the linear (ohmic) region. The feed-back logic, represented in Fig. 3, provides important information about the current-flow status in the controlled circuit (standard cells). In case that sensor (CSD) detects a higher current than usual it will

automatically provide related signal and the feed-back circuit together with the control logic turns-off the power supply of controlled standard cells, where the latchup or high-current flow occurred. The control-logic block communicates with the power network. The SPS cell informs a power network about the current status of the protection mode – whether a protection is activated due to latchup effect. On the other hand, if protection mode was triggered, a power network provides information to the SPS cell when protection mode should be deactivated.



Fig. 3. Block diagram of SPS cell



Fig. 4. Block diagram of current sensor/driver transistor

At this point is important to explain how a PMOS transistor provides enough power to standard cells during operation in the ohmic region and to notice a difference between quality of standard power supply and PMOS based power supply. Following the output characteristics of PMOS transistor for constant voltage between gate and source (2.5V), it is possible to notice that linear region goes up to 0.9V of drain-source voltage and 1mA drain current. If standard cells (represented as green block in Fig. 4) require more power, the goal should be providing more current with lower  $U_{DS}$  voltage. It is clear that in this case the required scenario is not possible, because the increase of the current increases the U<sub>DS</sub> voltage, what directly provides lower voltage for the standard cell power supply. Therefore, the only way is to find a trade-off between required current intensity for the supplied circuit (standard cells) and the voltage of the sensor/driver transistor.

The maximal current provided by the sensor/driver transistor of  $5\mu$ m channel width is between  $250\mu$ A and  $750\mu$ A. The maximal voltage drop is about 0.6V for

operational mode when the transistor is used as a driver. Therefore, the power provided by this transistor is not more than  $450\mu$ W, what is enough to supply 2 flip-flops and few combinational cells.



Fig. 5. Simplified SPS Schematic

In Fig. 5 is presented the simplified SPS schematic. In case that output pin Vdd1 is short-circuited, the transistor T5 (sensor/driver transistor) conducts more current than usual and the voltage between source and drain is higher. That means - the voltage on drain of the PMOS (T5) transistor is being lowered. The feedback line from the Vdd1 pin causes transistor T2 to activate when mentioned voltage is under the threshold voltage. Automatically, the transistor T1 will trigger Tstart (low active) output pin.

In order to wake up the power switch circuit (SPS) from the latchup protection mode, it is required to provide an impulse on the "Tstop" pin. This impulse should stop the current flow through the transistor T3 and set the gates of transistors T5 and T6 on the low voltage level. The transistor T5 should activate and provide power supply on Vdd1 output pin. The feedback line is deactivating the transistors T2 and T1, where "Tstart" pin should be set on the high voltage level, whereby latchup protection sequence is finished.



Fig. 6. Power Network Controller (PNC) block diagram

Very important block for correct SPS operation is power network controller (PNC). PNC is a digital subsystem which controls all latchup protection circuits (SPS cells) in the fault-tolerant digital system. It is designed to communicate with all SPS cells independently. It consists of programmable counter and control circuits.

Programmable counter defines duration of the latchup protection phase and control circuits are used to provide communication interface with SPS cells. The power network controller complexity is directly related to the redundancy type – DMR or TMR. Block diagram of PNC is shown in Fig. 6.

#### **III. AUTOMATED SPS PLACEMENT**

A very important design automation step is related to the placement of SPS cells. Parallel to the placement of SPS cells should be provided the integration process of the power network controller (PNC) within the standard design steps. Approach for automated SPS cell placement, which is used in the presented work is based on the Cadence Low-Power Implementation flow [12].

Example of a standard ASIC layout view with implemented power supply terminals is represented in Fig. 7. Power rings are not shown in the mentioned figure because they are not relevant for the SPS cell placement.



Fig. 7. Standard power network of an ASIC chip

In order to provide basis for the SPS based power network, it is required to provide some information related to the "row section" power consumption of standard power network. The start point for the row power estimation is the power consumption of one "row section". The "row section" is group of standard cells, placed in one row between two power stripes. In Fig. 7 it is possible to notice ROW1 and ROW2 sections. Using the IHP 250nm technology it is estimated that one row section consumes around  $400\mu$ A. This value is optimal for the SPS cell described before.

ASIC example which integrates SPS cells and redundancy (in this example DMR) is represented in Fig. 8. SEL power switch cells are placed exactly under the power stripes-row crossover points, instead of "filler" cells as usual. The power stripes and power rows at the points where a SPS is placed are connected only through the SPS cell. A SPS has one output – the controlled power supply line, used for one of the redundant circuits. This requirement is based on the concept of having separate power supplies for the two netlists used for the DMR.

Power supply distribution is provided separately for redundant components. In Fig. 8 is shown that row supply is "broken" in the neighbor SPS cell. SPS cell **1a** and SPS cell **2a** are electrically independent but because of the design rules it is important that wires are not floating. Therefore, row power supply provided by SPS cell **1a** is broken "internally" in SPS cell **2a**. The same approach is used for the redundant power supply, provided by SPS cell **1b** and SPS cell **2b**.



Fig. 8. DMR based circuit with integrated SPS cells

Comparing the standard power network with SPS based power network, it is not hard to notice the most important difference – one SPS cell provides power supplies for all redundancy levels and it is localized in four "row sections". As it was mentioned before, the approach used for the implementation is based on the DMR. For example, if latchup effect occurs in the standard cell "SC5\_a", the SPS cell **1a** detects a higher current intensity than usual and switches off the complete "row section" pair. In the same time, the SPS cell **1a** informs the PNC that latchup has been detected. A digital system which includes just latchup protection technique cannot provide correct functionality during the latchup protection phase. Therefore, it is necessary to use the specially designed redundant circuits which support the latchup protection [11].

#### IV. CASE STUDY: MIDDLEWARE SWITCH

During designing a new satellite system, it is usual to face with very complex problems. It is possible to divide them in groups and based on this to find the most optimal solution. The first problem is the long development time of an avionics system. Parallel with it are present the huge costs because of long time required for defining the new interface specifications. The development of very expensive board computers represents the second very important problem. All devices in a satellite communicate with each other through the board computer. Therefore, for every new satellite (or space related) system it is required to define a new device configuration and redesign the board computer. The way for solving these problems was the implementation of a SCAN network, [8] [9]. The central part of the whole SCAN system is the MW switch processor.

In this section are provided the most important information related to the implementation characteristics of the fault-tolerant middleware switch processor. Beside information about power consumption and area of the faulttolerant processor version, here are represented implementation characteristics of non-fault-tolerant design too. This is done in order to compare two same architectures, which are implemented using two different design approaches. The comparison result provides information related to effects on the implementation characteristics, which a fault-tolerant design has. It is important to notice that test case is realized using reduced version of middleware switch processor.

In the fault-tolerant MW switch version, complete hardware is doubled and for memory protection against potential SEUs is used EDAC and against latchup effect is used the same SPS approach as for the standard cells. As the memory requires more power, few SPS cells are connected in parallel mode. Power network controller is in this example implemented in the SGB25RH process without latchup protection [13]. It is possible to provide two parallel power network controllers with integrated latchup protection but this is not the goal of this discussion.

In order to provide better view on the standard cell type used during implementation, in Table I is presented occupied area in relation to the combinational or sequential cells. It is important to notice that non-combinational cells involving memory blocks and flip-flops. Test case is implemented without latch cells.

After netlist parsing and timing analysis of new DMRbased netlist it is possible to notice increase of power consumption and required silicon area. The implementation results of DMR netlist together with power network controller are presented in Table II.

| Standard           | cell type          | Required<br>area<br>[mm²] |
|--------------------|--------------------|---------------------------|
| Combinational Cell | s                  | 4.116                     |
| Nez                | Cache Memory –     | 0.813                     |
| NON-               | data + instruction | 3.165                     |
| Collo              | S3P FIFO           | 11.626                    |
| Cells              | Sequential Cells   | 5.012                     |
| TO                 | 24.876             |                           |

 TABLE I

 Area regarding cell type using standard desgin approach

| TABLE II                                          |                    |        |  |  |  |  |
|---------------------------------------------------|--------------------|--------|--|--|--|--|
| AREA REGARDING CELL TYPE USING FT DESGIN APPROACH |                    |        |  |  |  |  |
|                                                   | Required           |        |  |  |  |  |
| Standard                                          | area               |        |  |  |  |  |
|                                                   | [mm²]              |        |  |  |  |  |
| Combinational Cell                                | 8.232              |        |  |  |  |  |
| Non-<br>combinational                             | Cache Memory –     | 1.626  |  |  |  |  |
|                                                   | data + instruction | 6.330  |  |  |  |  |
|                                                   | S3P FIFO           | 23.525 |  |  |  |  |
| Cells                                             | Sequential Cells   | 39.903 |  |  |  |  |
| TO                                                | 79.616             |        |  |  |  |  |

The main reason for the area and power overhead is the power network controller. SPS cells doesn't involve any area overhead because they are implemented under power stripes where are usually placed filler cells. This is an important result because in this example are placed 21335 SPS cells.

It is clear that presented design methodology provides designs with reduced maximal operational frequency. Power consumption and required silicon area are also degraded. On the other hand, the protection against latchup effect, as well as protection against single event upsets and transients require trade-off between required hardware, power consumption, maximal operating frequency and sufficient protection level against radiation effects.

In the following figure is presented a view of MW switch chip floor-plan, prepared using the developed fault-tolerant design methodology. In order to provide better routing, the PNC is placed around processor core as it is possible to notice in Fig. 9.

Placed SPS cell is presented in Fig. 10. It is important to note that stripes in MW switch processor core are generated using Metal3 layer. This is done because of control signals which are routed in Metal2 layer.



Fig. 9. Floor plan view of fault tolerant MW processor



Fig 10. SPS cell placed and routed (special route only)

#### V. CONCLUSION

Advanced redundant circuits need a latchup protection in order to operate in a reliable manner. Therefore, this paper introduces and describes newly developed single event latchup (SEL) protection switches and a technique for their integration into an ASIC design.

The main contribution of the presented work is the introduction and development of a methodology for highly reliable digital ASIC designs based on redundant circuits with latchup protection. Based on the implementation figures, it is easy to notice that the proposed design methodology comes at the price of an overhead of area and power. It is important to note that the area and power overheads have two different causes. The first cause is the additional control logic used to support the power protection technique in redundant circuits. The second cause for the area and power overhead is the implementation of the power network controller (PNC), important for proper operation of all integrated SPS cells.

The SPS cell itself does not affect the area overhead due to the mechanism by which it is integrated into an ASIC. This was achieved by an innovative technique to utilize area normally reserved for filler cells, while staying in the standard design flow.

#### ACKNOWLEDGEMENT

The research leading to these results has received funding from the European Union's Seventh Framework Program FP7 (2007 - 2013) under the grant agreement no. 284389 also referred as VHiSSI.

#### REFERENCES

- J. Von Neumann, "Probabilistic Logics", Automata Studies, Princeton University Press, 1956
- [2] R. E. Lyons, W. Vanderkulk, "The Use of Triple-Modular Redundancy to Improve Computer Reliability", IBM Journal April 1962
- [3] M. P. Baze, S. P. Buchner, D. McMorrow, "A Digital CMOS Design Technique for SEU Hardening", IEEE Transaction on Nuclear Science, Vol. 47, No. 6, pp. 2603-2608, December 2000
- [4] N. Rollins, M. Wirthlin, P. Graham, M. Caffrey, "Evaluating TMR Techniques in the Presence of Single Event Upsets", Military and Aerospace Programmable Logic Devices International Conference, Washington DC, USA, 2003
- [5] J. Teifel, "Self-Voting Dual-Modular-Redundancy Circuits for Single-Event-Transient Mitigation", IEEE Transactions on Nuclear Science, VOL. 55, NO. 6, (pp. 3435-3439), December 2008
- [6] D.B. Estreich, A. Ochoa, and R.W. Dutton, "An Analysis of Latchup Prevention in CMOS IC's Using Epitaxial Buried Layer Process", International Electron Device Meeting, 1978, pp. 76-84

- [7] R.R. Troutman, "Epitaxial Layer Enhancement of n-Well Guard Rings for CMOS Circuits", IEEE Electron Devices, 1983, (pp. 438-440).
- [8] S. Montenegro, V. Petrovic, and G. Schoof, "Network Centric Systems for Space Application," Advances in Satellite and Space Communications, SPACOM 2010, Page(s): 146 – 150
- [9] V. Petrovic, M. Ilic, G. Schoof, and S. Montenegro, "Implementation of Middleware Switch ASIC Processor", IEEE Telfor Journal, Vol. 4, No. 2, 2012
- [10] Maxwell Technologies, "Latchup Protection Technology<sup>TM</sup> (LPT) Overview", Available: http://www.maxwell.com/products/microelectronics/lat chup-protection
- [11] V. Petrovic, M. Ilic, G. Schoof, Z. Stamenkovic,
  "Design Methodology for Fault Tolerant ASICs", Proc. of the 15th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems Symposium (DDECS 2012), 8 (2012)
- [12] Cadence Custom IC Design Circuit Design, Available:

http://www.cadence.com/products/cic/Pages/default.aspx [13] IHP GmbH, Leibniz Institute Frankfurt Oder,

Available: www.ihp-microelectronics.com

### HDL IP Cores System as an Online Testbench Provider Vladimir Zdraveski, Andrej Dimitrovski and Dimitar Trajanov

*Abstract* - A huge part of the HDL development process is spent on testing and simulation. Supporting the idea of a testbench design automation, we present a module of the HDL IP Cores system, integrated with a client-side eclipse plug-in, as an automatic testbench search engine embedded inside the designer's native programming environment. The concept is extended with the use of a simulator for compatibility verification and existing results ranking improvement.

Keywords - automation, HDL, search, testbench, verification.

#### I. INTRODUCTION

The frequent use of the programmable chips in production systems requires reliability and stability, so the whole process of HDL design must be kept in an enterprise level, guaranteeing customer's satisfaction and security. Companies, aware of this fact, are very interested in the improvement of testing and verification methods of their products [1][2]. Some of them have a separate testing departments, fully dedicated to the IP cores verification [3][4]. and also have advanced testing process workflows, that greatly decrease the error probability [5].

The era of open-source development allows a significant rise in the development of applications. Outside the commercial world, there is a bunch of open source HDL including test benches. There are several web portals, groups and online communities [6]-[12]. These repositories have too many projects in which it is very difficult to find the required component and also there is no central hub to connect all of them together, so the designer have to visit them one by one. Yet many people worldwide work on a similar projects and design very related IP cores and testbenches, but unfortunately are not aware of each other, do the same thing and waste time.

There are also numerous of open source testbench files on the Internet and enterprise development allows and stimulates the code reuse, that is of a particular importance time saving. But, it is very difficult to get the right one, due to the large growth of the set of existing testbench components available online, which should be searched including the procedure of checking the parts, that the testbench contains, before using it in your project.

More over large companies have their own huge repositories of testbenches, used in the past (fully verified), and face the same problem of lack of automation tools

Vladimir Zdraveski, Andrej Dimitrovski and Dimitar Trajanov are with the Department of Information Systems and Network Technologies, Faculty of Computer Science and Engineering, "Ss. Cyril and Methodius" University, Ruger Boskovik 16, 1000 Skopje, Macedonia, E-mail: {vladimir.zdraveski, dimitar.trajanov, andrej.dimitrovski} @finki.ukim.mk inside their borders, that will support the code reuse and speed up the development process. So, the testing and debugging process is yet quite time consuming [13] and commercially inefficient.

The HDL design, without doubts, depends on the testing and simulation, that sometimes take a lot of time [14]. Very often, the time required to write a testbench is comparable to the time required to prepare the IP core itself, that is a quite undesirable information for the companies' management teams.

In that direction we present a concept for testbench retrieval and seamless integration into the designers' native workspace. Using our existing HDL IP Cores system [15] and integrating existing client-side tools and simulators, we managed to go a step forward and provide the users an automated functionality of a testbench search and download directly inside their HDL programming environment.

#### II. Related Work

A. HDL IP Cores System - Overview



Fig. 1 - HDL IP Cores architecture

The system, Fig. 1, consists of a server side application and a client side eclipse plug-in. The server side application contains a web crawler, that downloads IP cores from the Internet and then each IP core passes through a process of annotation. The metadata together with the IP core source information are stored in RDF Repository. HDL IP Cores system does a deeper semantic annotation and provides components ranking by similarity and compatibility with a given component.

The second part of our system is the client side Eclipse plug-in, that can access the core system through a web service interface. The main idea is to provide the users with the functionality of searching and downloading IP Cores available and annotated in the server-side repository. We use the "Sigasi" editor plug-in [16] for the client side HDL functionalities, but any other plug-in, that provides HDL editor, may be used. Annotation is based on the knowledge placed in our custom ontology [17].

#### B. Functionalities of the existing simulators

Verissimo System Verilog Testbench Linter [18] is a coding guideline and verification methodology compliance checker, that enables engineers to perform an additional audit of their testbenches. With this tool, designers can check whether their code is free of language pitfalls and semantic or style issues, and compliant with the appropriate methodologies. Verissimo can be customized to check specific group or corporate coding guidelines to ensure consistency and best practices in code development. For example, the possibility of implementing the same functionality in multiple ways may impact the simulation performance or lead to unexpected behavior.

VCS' Native Testbench (NTB) [19] technology provides built-in natively-compiled support for full-featured System Verilog and Open Vera testbenches, including object-oriented, constrained-random stimulus and functional coverage capabilities. VCS further expands its capabilities with Echo constraint expression convergence technology. Echo automatically generates stimuli to efficiently cover the testbench constraint space, significantly reducing the manual effort needed to verify a large number of functional scenarios. Echo is a perfect fit for all teams using System Verilog testbenches with random constraints.

Cadence's Enterprise Simulator [20] supports all IEEE-standard languages, the Open Verification Methodology (OVM), Accellera's Universal Verification Methodology (UVM), and the e Reuse Methodology (eRM), making it quick and easy to integrate with your established verification flows. Functionality of Enterprise Simulator provides a high-throughput channel between the testbench and the device under test (DUT). This enables automated metric-driven verification of embedded software exactly as if it was another part of the DUT. Today, Enterprise Simulator fuels testbench automation, reuse, and analysis to verify designs from the system level, through RTL, to the gate level. It supports the metric-driven approach implemented by Incisive Enterprise Manager. Its native-compiled architecture speeds the simultaneous simulation of transaction-level, behavioural, low-power, RTL, and gate-level models-critical to the verification of a modern multi-language, multi-abstraction and mixedsignal SoC.

Xilinx ISE Simulator (ISim) [21] provides a complete, full-featured HDL simulator integrated within ISE. HDL simulation now can be an even more fundamental step within your design flow with the tight integration with the ISim within your design environment. Xilinx tools automatically generate lines of VHDL code in the testbench file to get you started with circuit signals definition and define the inputs and outputs. The simulator has few other tools in order to run, pause and stop the simulation.

Despite the mentioned simulator implementations and their basic logic [22], there are also useful ideas for conceptual improvement [23][24]. All previously mentioned simulators have different features that implement the simulation (verification) of the testbench components. But they come to the scene after the designer would manually instantiate a testbench component in the simulator and then use the available tools. None of the previously mentioned simulators offers an automatic online search for testbench components and easy code reuse.

#### IV. Test bench provider module

Our approach is a context aware testbench search tool, that use ontology-based knowledgebase. Our system uses semantic annotated data to find the right testbench component and integrate into the development environment of the user, in our case Eclipse. HDL IP Cores system will be used to facilitate the generation of the result list of testbenches. But also, a server side verification of the compatibility between the testbench and component is required. The system should perform a verification i.e. automatic simulation of the testbench, producing ranked result list of suitable testbenches. The simulation also could be made on the client side, using a simulator, that is embedded in our plug-in or engaging with another third-party plug-in simulator, already installed inside the client environment.

#### A. Test-bench retrieval process

In our system, the search for a testbench is made by the use of a search engine, based on OWL domain ontology and RDF knowledgebase [15]. Since the HDL files have a predefined structure by themselves, the annotation is done automatically, using custom ontology as a domain data schema. Although the process requires no further input by the end users, it is a step forward in the HDL code search engines improvement.

Our tool enables the designer to search directly from his workspace, i.e. to run the "Find Testbench"-tool, Fig. 2, with a right-click on the component file. User's component is then sent to the HDL IP Cores server, annotated and used as an input to the compatible testbenches list generator.

| ▼ 🥵 TestTutorial [work] | library IEEE;<br>use IEEE.std_logi |
|-------------------------|------------------------------------|
|                         | Find Testbench                     |
| M add32.vhd [work]      | Find Similar                       |
| 🕅 asyncLoad.vhd [wo     | Find Compatible                    |
| 📝 clock_generator.vl    | Find compacible                    |
| 🛛 dut.vhd [work]        | Search                             |
| M ram do ar aw.vhc      | New Component From Template        |
| regFile.vhd [work]      | New System From Template           |

Fig. 2 - Client side functionality

The search is made by matching the semantic concepts specified in the user's request and the semantic annotations of the components available in the RDF repository of the system, Fig. 3. Also a very important part of the result list is the port map between the user's required component and resulting testbench's component (the component which is inside the testbench in the repository, Fig. 3), that enables the client plug-in to instantiate the user's component inside the testbench and run the simulation automatically.

A new window will display the testbench components that correspond to the selected component. This is especially important for the designer, because he will obtain the testbench compatible to his component within his project, without Internet browsing and downloading hundreds of files and archives.



Fig. 3 - The matching process

The system provides a preview of the test bench component and after the user's choice, the testbench will be instantly downloaded to the project and lead the user just a step to the simulation execution.

We have to note that automatic file download is done in run time, directly from the original file URL. There is no HDL code in the HDL IP Cores repository, supporting the intellectual property and licensing paradigm and possible licensing condition changes in future. Direct download is available for the license free IP cores only and when the IP core is published under different license, the system will redirect the user to the IP core provider's web site.

#### B. Testbench compatibility verification

The testbench generates input values for the design under testing and checks the answers, so design must be simulated carefully to find errors. The patterns, VHDL simulation stimuli, are described in a specific formalism, that can be captured using a dedicated language generation pattern. Once a VHDL behavioral description is written and a set of test vectors have been determined, a functional simulation is started.

To simulate a design the testbench must be compatible with the architecture or otherwise warnings or errors will appear in the simulator's output. We propose to instantiate the testbench, run the simulator automatically and use the simulator's output in the testbench results ranking process. Moreover, the result list may contain warning and error flags on each result item, notifying the client that if he chooses that testbench he will have to review and correct warnings and/or errors in order to go on with the simulation process. In our system there are two possible verification solutions, a client side and a server side.

The first scenario is the simulation on the client-side application (Eclipse plug-in) that includes simulation of the actual testbench component. There are several different simulation tools. One is DVT Eclipse, which is a plug-in for Eclipse [25], providing a satisfying environment to simulate a testbench. DVT integrates seamlessly with all major hardware simulators to enable simplified simulation analysis. The designer may execute the simulation of the testbench component right in his project. Using the capabilities of DVT and its integration into our system will enable automatic compatibility verification on the list of testbenches received from the HDL IP Cores server.

This will allow verification of the semantically annotated testbenches for the selected component, and will show to designer the test benches which are the most compatible with the component architecture. The simulator would be integrated on the client side as a part of the Eclipse Core Packages. It will accelerate the process of finding the most compatible test benches and will save time compared with the manual simulation, Fig. 4, because the testbench will already be inside the designer's project and will encompass syntax and semantic checks with errors highlighted as the designer types, will do an initial port mapping.



Fig. 4 - Client side simulator

Since the compatibility check is made on the client side, we may optionally provide integration with few other simulators available for Eclipse and the user will be able to choose among them and use his preferred one together with our system.

The second possible solution architecture, Fig. 5, is to set the simulator [26] on the server side. Its role again will be to determine the compatibility between components and test benches that obtain ranked results. The integration of the simulator on the server side would generate overhead data and additional server's CPU usage due to the verification (simulation) simulation. The delay will depend on the size and type of the component and the testbench, but the client will get the final list and HDL designer will not have to install additional client-side plug-in, but the HDL IP Cores plug-in only.



Fig. 5 - Server side simulator

#### IV. CONCLUSION AND FUTURE WORK

In order to speed up the testbench generation process we described our existing system's testbench searching feature and two possible improvement concepts using a third-party HDL simulator. The system architecture and the client side functionality were described in details, providing a global picture of the whole concept.

The next steps would be to implement both scenarios and do a performance tests and evaluation survey in order to get user feedback comments.

The main benefit of our proposed concept is that testbenches will be annotated according to a central ontology (inside a company or worldwide) and users will be able to find and download a specific testbench faster, very easy and directly via their native designing workspace, without a need to open a browser and visit tens of web pages.

In a commercial environment it is possible to deploy a local instance of the HDL IP Cores system and integrate it with company's native code storage engine, inheriting users credentials and access permissions and rules. This way we will provide the HDL-designers with the described functionalities, keeping their code repositories inside the company, that is of essential importance in an enterprise environment.

#### ACKNOWLEDGEMENT

The work in this paper was partially financed by the Faculty of Computer Science and Engineering, at the "Ss. Cyril and Methodius" University in Skopje, as a part of the research project "Semantic Sky 2.0: Enterprise Knowledge Management".

#### References

- [1] Arabi, K., "Special session 6C: New topic mixed-signal test impact to SoC commercialization," VLSI Test Symposium (VTS), 2010 28th, vol., no., pp.212,212, 19-22 April 2010.
- [2] Chung-Yang Huang; Yu-Fan Yin; Chih-Jen Hsu; Chung-Yang Huang; Ting-Mao Chang, "SoC HW/SW verification and validation," *Design Automation Conference (ASP-DAC), 2011 16th Asia and South Pacific*, vol., no., pp.297,300, 25-28 Jan. 2011.
- [3] He Zhang; Chunyu Wu; Wenjing Zhang; Jiwei Wang, "Design on SOC module-level functional verification platform," *Mechanic Automation and Control Engineering (MACE), 2011 Second International Conference on*, vol., no., pp.4012,4015, 15-17 July 2011.
- [4] Qingdong Meng; Zhaolin Li; Fang Wang, "Functional verification of external memory interface IP core based on restricted random testbench," *Computer Engineering* and Technology (ICCET), 2010 2nd International Conference on, vol.7, no., pp.V7-253,V7-257, 16-18 April 2010.
- [5] Lulu Feng; Zibin Dai; Wei Li; Jianlei Cheng, "Design and application of reusable SoC verification platform," ASIC (ASICON), 2011 IEEE 9th

International Conference on , vol., no., pp.957,960, 25-28 Oct. 2011.

- [6] Open Cores web portal, http://opencores.org/
- [7] Java optimized processor group,
- http://tech.groups.yahoo.com/group/java-processor/ [8] IP supermarket – web portal,
- http://www.ipsupermarket.com/index.php
- [9] Infineon web portal, http://www.ipsupermarket.com/index.php
- [10] Lattice web portal, http://www.latticesemi.com/
- [11] Chip Estimate web portal, http://www.chipestimate.com/
- [12] Design & Reuse web portal, http://www.design-reuse.com/
- [13] Cheng, X.; Ruan, A.W.; Liao, Y.B.; Li, P.; Huang, H.C., "A run-time RTL debugging methodology for FPGA-based co-simulation," *Communications, Circuits and Systems (ICCCAS), 2010 International Conference on*, vol., no., pp.891,895, 28-30 July 2010.
- [14] Corno, F.; Sanchez, E.; Reorda, M.S.; Squillero, G., "Automatic test program generation: a case study," *Design & Test of Computers, IEEE*, vol.21, no.2, pp.102,109, Mar-Apr 2004.
- [15] V. Zdraveski, M. Jovanovik, R. Stojanov and D. Trajanov. "HDL IP Cores Search Engine based on Semantic Web Technologies", *ICT Innovations 2010, Communications in Computer and Information Science,* Volume 83, 2011, pp 306-315, Ohrid, Macedonia, September 2010.
- [16] Sigasi intelligent HDL editor plug-in for eclipse <u>http://www.sigasi.com/</u>
- [17] V. Zdraveski, D. Trajanov. "VHDL IP Cores Ontology", Conference for Informatics and Information Technology (CIIT), Bitola, April 2013.

- [18] Verissimo System Verilog Testbench Linter http://www.dvteclipse.com/Verissimo\_SystemVerilog Testbench Linter.html
- [19] VCS' Native Testbench (NTB) <u>http://www.synopsys.com/Tools/Verification/Function</u> <u>alVerification/Documents/vcs-ds.pdf</u>
- [20] Cadence's Enterprise Simulator <u>http://www.cadence.com/products/fv/enterprise\_simul</u> <u>ator/pages/default.aspx</u>
- [21] Xilinx ISE Simulator (ISim) http://mazsola.iit.unimiskolc.hu/~kulcsfm/DigRendII elemei/VHDL anyag ok/ISE Simulator ISim VHDL TestBenchTutorial.p df
- [22] Brown, A.D.; Nichols, K. G.; Zwolinski, M., "Issues in the design of a logic simulator: an improved caching technique for event-queue management," *Circuits, Devices and Systems, IEE Proceedings* -, vol.142, no.5, pp.293,298, Oct 1995.
- [23] Maksimovic, D.M.; Litovski, V.B., "Timing simulation with VHDL simulators," *Microelectronics*, 2002. MIEL 2002. 23rd International Conference on , vol.2, no., pp.655,658, 2002.
- [24] Maksimovic, D.M.; Litovski, V.B., "Tuning logic simulators for timing analysis," *Electronics Letters*, vol.35, no.10, pp.800,802, 13 May 1999.
- [25] DVT Eclipse client side integration <u>http://www.dvteclipse.com/?gclid=CM3Ex\_G4\_7oCF</u> <u>c1V3godKA4Abw</u>
- [26] Cadence simulator server side integration http://www.cadence.com/ip/vip/pages/default.aspx

### Computer Workstation Vetting by Supply Current Monitoring

# Marko Dimitrijević, Miona Andrejević Stošović, Octavio Nieto, Slobodan Bojanić, and Vančo Litovski

*Abstract* – It is our goal within this project to develop a powerful electronic system capable to claim, with high certainty, that a malicious software is running (or not) along with the workstations' normal activity. The new product will be based on measurement of the supply current taken by a workstation from the grid. Unique technique is proposed within these proceedings that analyses the supply current to produce information about the state of the workstation and to generate information of the presence of malicious software running along with the rightful applications. The testing is based on comparison of the behavior of a fault-free workstation (established in advance) and the behavior of the potentially faulty device.

*Keywords* – monitoring, malicious software, supply current.

#### I. INTRODUCTION

These proceedings are based on advanced analysis of power supply current to the Device Under Test (DUT) with the aim of detecting malicious activity. The method stems from the long term research in the fields of electronic design, testing, diagnosis, statistical analysis, and artificial neural networks application within the Laboratory for Electronic Circuit Design Automation at the University of Nis.

| State                     | V (RMS) | I (RMS) | TPF (%) | (%) IQHL | P (W) | QB (W) | U (W) | D (W) |
|---------------------------|---------|---------|---------|----------|-------|--------|-------|-------|
| Hibernation               | 217.5   | 0.090   | 5.29    | 18.4     | 1.04  | -19.3  | 19.6  | 3.3   |
| Standby                   | 218.7   | 0.093   | 12.4    | 32.2     | 2.53  | -19.3  | 20.4  | 6.0   |
| Idle                      | 217.8   | 0.339   | 88.5    | 16.4     | 65.4  | -31.9  | 73.9  | 12.6  |
| High load<br>(Video)      | 218.0   | 0.348   | 89.1    | 16.2     | 67.7  | -32.3  | 76.0  | 12.1  |
| High load<br>(Simulation) | 217.6   | 0.537   | 95.0    | 13.0     | 111.  | -32.1  | 117.  | 18.3  |

# TABLE I MEASURED CONSUMPTION OF A PC

As part of the research task to characterize the personal

M. Dimitrijević and M. Andrejević-Stošović are with the Department of Electronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia E-mail: {marko, miona}@elfak.ni.ac.rs.

O. Nieto and S. Bojanić are with Technical University of Madrid, {nieto, sbojanic}@

V. Litovski is with Cluster of Advanced Tecnologies NiCAT Niš, vanco@elfak.ni.ac.rs.

computer as an energy consumer [1], Table 1. was reported.

The above measurements were obtained for a desktop computer (DELL Optiplex 980, Intel Core i7 CPU @ 2.8GHz, 4GB RAM, 500GB HDD). Characterization variables above are as follows: the RMS value of the line voltage (V), the RMS value of the line current (I), the total power factor (TPF), the total harmonic distortion of the current (THDI), the active power (P), the reactive power (QB), the apparent power (U), and the distortion power (D). Several states of activity were considered: Hibernation, Standby, Idle, High load (Video), and High load (Simulation).

Table 1 demonstrates that the internal activity of the computer can be deduced from power characterization. Particularly interesting aspect is harmonic distortion here represented by the THDI. Namely, the computer, like most electronic loads to the grid, is nonlinear. That means it distorts the grid current from its original sinusoidal waveform i.e. creates harmonic being measurable from the grid side. The current waveform, as may be partly deduced from Table 1, depends on the activities within the computer and so are the harmonics. That suggests that by measurement of the current waveform one may try to identify malicious activities.

Similar methods are applied in IDDQ testing and diagnosis of CMOS electronics [2]. Here, every event in a CMOS digital system introduces a very short pulse into the DC supply current of the circuit. These pulses aggregate to form the DC current (IDD), imprinting information about all activities in the system to the supply current.

Findings in [1] and [2], suggest a correspondence between the activity within the device and the measured cha racteristics of the supply current. Our proposal is to develop a technique for utilizing this correspondence to perform comparisons between a vetted device, which will be stated in the next as the DUT, and a standard fault free device.

The analysis process can be described as a sequence of following steps:

- 1. Measurement
- 2. Creation of time series "strings"
- 3. Spectrum computation
- 4. Comparisons similarity evaluation
- 5. Proposing a hypothesis

The following sections describe the proposed system in more detail.

#### II. THE MEASUREMENT SUBSYSTEM

The physical structure of the Vetting by Supply Current Monitoring (VbCM) system we are proposing here is depicted in Fig. 1. It is connected to the power grid via the acquisition module (AC), and transfers power directly to the DUT (load) while sampling the values of the current and voltage waveforms passing through. The sampled values are appropriately conditioned and coded, and then directly delivered to the testing computer (TC) via USB or Wi-Fi connection. The software implemented in the TC performs all computations. TC is used as a visualization device, enabling display of the measured and derived waveforms; as an interactive monitor allowing monitoring and control of the chara-terization process; as a data storage device creating measurement logs and databases; and as a communication means enabling remote control of the measurement and on-line delivery of the results.



Fig 1. Physical structure of the VbCM system (The 220 Vrms is related to European standards)

The acquisition module is performing acquisition and conditioning of the electrical signals. The module for signal conditioning of the voltage and current waveforms provide attenuation, isolation, and antialiasing.

In the present three-phase application the acquisition is performed by National Instruments cDAQ-9714 expansion chassis, providing hot-plug module connectivity. The chassis is equipped with two data acquisition modules: NI9225 and NI9227. Extension chassis is connected to TC running virtual instrument via USB interface. NI9225 has three channels of simultaneously sampled voltage inputs with 24-bit accuracy, 50 kSa/s per channel sampling rate, and 600 VRMS channel-to-earth isolation, suitable for voltage measurements up to 100th harmonic (5 kHz). The 300 VRMS range enables line-to-neutral measurements of 110V or 240 V power grids. NI9227 is four channels input module with 24-bit accuracy, 50 kSa/s per channel sampling rate, designed to measure 5 ARMS nominal and up to 14 A peak on each channel with 250 VRMS channelto-channel isolation. The virtual instrument is realized in National Instruments LabVIEW developing package which provides simple creation of virtual instruments. Virtual instruments consist of interface to acquisition module and application with graphic user interface.

#### **III. CREATION OF TIME SERIES**

In this stage, the signal obtained after testing is

converted to time series. There are three scenarios for obtaining samples:

- Boot sequence
- Idle state
- Application execution

During the measurement phase of the test, time series is obtained for selected device activity as above. For example, in Table 1. five states are established among which three are with no application running while the last two excessively load the processor and the video card. The choice of a number of test runs and application is subject to further analysis.

Further research task is establishing the length of data strings and how many strings will be required for every state of the DUT. It is our experience that for getting the spectrum of the current by the Goertzel algorithm, 200 ms (for the 50 Hz case) are sufficient. The strings measured for the given set of states of the fault-free device are sufficient for its characterization since stationary conditions are established and no change in time may be expected. However, length of testing of the potentially 'faulty' device is also subject to further research.

#### IV. Comparisons - similarity evaluation

From our experience in time series prediction [3,4], classification [5] and diagnosis [6,7,8], the subject of comparison of the responses of the fault-free and the potentially faulty DUT will be an important research issue within this project.

According to [9], there are several measures of similarity of time series that may be used concurrently. For example, the correlation coefficient may be calculated as:

$$r_{pq} = \frac{\sum_{k=0}^{N-1} \left(I_k^p - \bar{I}^p\right) \left(I_k^q - \bar{I}^q\right)}{\sqrt{\sum_{k=0}^{N-1} \left(I_k^p - \bar{I}^p\right)^2} \sqrt{\sum_{k=0}^{N-1} \left(I_k^q - \bar{I}^q\right)^2}}$$
(1)

where Ip is the first series, Iq is the second series, and , are mean values, and N is the number of samples. To demonstrate we made new measurements, similar to the ones depicted in Table 1, some results of which will be presented here. The DELL Optiplex 980 run under Windows 7 Professional was considered in the following states: 1. Off : meaning the computer was switched off; 2. Idle: Only the operating system is running (65 processes and 975 threads were active during the measurement while 0% of the CPU was utilized); 3. Video: 4 MPEG4 video streams were activated simultaneously while 4%-8% of the processor was loaded; 4. CPU Arithmetic: Synthetic DhrystoneiSSE4.2 and Wetstone iSSE3 Benchmark test were activated with 100% of the CPU loading; 5. Multi-Media CPU: Synthetic Multi-MediaInt x16 iSSE4.4, Float X8 iSSE2 iDouble x4 iSSE2 were activated with 100% of the CPU loading; 6. GPU Rendering: renderbenchmark test were activated to test the Graphic processor: NativeFloatShaders, EmulatedDouble-Shaders; 7. Physical Disks: test for evaluation of the disc performances were activated: Physical disk benchmark WDC5000AAKS-007AA0; 8. File System Benchmark: Performances test for the I/O file system was activated.

| I ADLE II |
|-----------|
|-----------|

#### CORRELATION COEFFICIENTS FOR THE SIGNALS OBTAINED FOR DIFFERENT STATES OF THE WORKSTATION

| $\begin{array}{c} r_{pq} & \\ p \rightarrow & \\ q \downarrow & \end{array}$ | Off      | Idle     | Video    | CPU<br>Arithmetic | GPU<br>Rendering | Multi-Media<br>CPU | Physical<br>Disks | File System<br>Benchmark |
|------------------------------------------------------------------------------|----------|----------|----------|-------------------|------------------|--------------------|-------------------|--------------------------|
| Off                                                                          | 1        |          |          |                   |                  |                    |                   |                          |
| Idle                                                                         | -0,92574 | 1        |          |                   |                  |                    |                   |                          |
| Video                                                                        | 0,287788 | -0,01358 | 1        |                   |                  |                    |                   |                          |
| CPU Arithmetic                                                               | -0,96037 | 0,857545 | -0,49706 | 1                 |                  |                    |                   |                          |
| GPU Rendering                                                                | -0,90384 | 0,986579 | 0,079612 | 0,811307          | 1                |                    |                   |                          |
| Multi-Media CPU                                                              | 0,867432 | -0,97951 | -0,13936 | -0,7647           | -0,99222         | 1                  |                   |                          |
| Physical Disks                                                               | -0,60827 | 0,798615 | 0,528475 | 0,422196          | 0,862064         | -0,88421           | 1                 |                          |
| File System Benchmark                                                        | 0,785238 | -0,59354 | 0,771507 | -0,91172          | -0,52713         | 0,458711           | -0,05733          | 1                        |

Strings of 10,000 samples (200 ms or 10 periods), were taken at the rate of 20  $\mu$ s. After implementation of (1) the correlation matrix given in Table 2 was obtained. While these results may be used for some further analysis (for example, high correlation coefficient means that the same resources were (equally) active during the run of two particular software packages.) they are introduced here only to demonstrate what is expected to be done with the measurement results when comparing the behavior of the fault-free and the DUT.

In the tables that will be used for testing, while the rows indicating the states will be kept as above, the columns will be related to the strings displaced in time obtained by repetitive measurements of the same state. Accordingly, the table will be a matrix albeit not a symmetric one. Search will be performed for the cases with smallest correlation coefficients, which indicate difference in the behavior of the DUT and fault-free device.

#### V. COMPARISONS - SPECTRUM COMPUTATION

Additionally, analysis will be performed in spectral domain, for each time series string. The transformation almost entirely preserves the information content of the string while reducing the data size. To do that, because of the instability of the frequency of the grid, the frequency is to be extracted first [10]. Then, an algorithm is to be implemented for Discrete Fourier Transform (DFT) that is resistant to the instability of the period of the signal. The Goertzel algorithm [11] will then be applied.

Approximately 50 harmonics may be observed in a sample (string) of a grid current. We expect approximately

10 strings per device to be created for every state of the device, one of them stored for the fault-free device, to be used as a base for comparisons.

For the measurement of a fault-free device analyzed in Table 2 appropriate DFT was performed. The spectral components obtained are presented in Table 3. Since even harmonics have incomparably smaller values than the odd ones, in Table 3 only the DC, the main, and the odd harmonics are presented. Figure 2 illustrates two columns of Table 3.

While these results may be used for some further analysis we are delivering them here as a kind of proof-ofconcept since not the same value of any harmonic may be found while similarities may be extracted.

#### VI. CREATING THE MOST PROBABLE HYPOTHESIS

The testing will be implemented in the following way.

First, for the fault free device, one string of the supply current will be created after steady state for every test program. We will denote the strings here as Si, i=1,2,..., n. n is the number of software packages developed for testing purposes. At the very moment we are not sure as to what value of n will be necessary.

Then, for every installed malicious software, after equilibrium, m strings of supply current will be taken, separated by a fixed time interval, for all n testing soft-ware packages. We will denote these strings by  $Q_{i,j}$ , i=1,2,...,n, and j=1,2,...,m.

Accordingly, there will be nHm strings generated from the DUT for every malicious attack.
| ODD HARMONICS EXTRACTED FROM ONE STRING MEASUREMENT IN EIGHT DIFFERENT STATES OF THE WORKSTATION |        |        |       |       |       |       |       |      |       |      |      |      |      |
|--------------------------------------------------------------------------------------------------|--------|--------|-------|-------|-------|-------|-------|------|-------|------|------|------|------|
| Harm. No.                                                                                        | DC     | 1      | 3     | 5     | 7     | 9     | 11    | 13   | 15    | 17   | 19   | 21   | 23   |
| Off                                                                                              | 0.55   | 89.7   | 3.05  | 8.55  | 8.94  | 3.08  | 8.76  | 2.77 | 6.28  | 4.81 | 0.69 | 0.92 | 0.62 |
| Idle                                                                                             | - 0.84 | 400.26 | 47.9  | 23.18 | 11.41 | 9.19  | 6.17  | 1.4  | 9.81  | 3.66 | 4.16 | 7.39 | 5.17 |
| Video                                                                                            | 1.3    | 475.4  | 54.03 | 23.52 | 12.3  | 7.7   | 7.24  | 1.73 | 12.19 | 5.1  | 5.05 | 6.52 | 7.15 |
| CPU Arithmetic                                                                                   | 0.52   | 785.73 | 34.6  | 28.7  | 17.43 | 10.12 | 12.27 | 6.01 | 5.98  | 8.91 | 5.74 | 4.89 | 6.06 |
| GPU Rendering                                                                                    | - 0.68 | 747.73 | 35.84 | 28.42 | 16.77 | 9.26  | 11.13 | 5.81 | 6.84  | 9.9  | 5.68 | 5.12 | 7.19 |
| Multi-Media CPU                                                                                  | -1.3   | 394.33 | 47.79 | 22.83 | 9.74  | 9.17  | 6.12  | 1.99 | 9.32  | 5.6  | 3.3  | 6.65 | 5.55 |
| Physical Disks                                                                                   | 0.23   | 381.54 | 48.05 | 23.53 | 6.96  | 8.63  | 5.36  | 2.49 | 9.94  | 3.76 | 5.75 | 5.55 | 4.56 |
| File System<br>Benchmark                                                                         | 0.51   | 411.72 | 47.73 | 24.14 | 9.61  | 9.5   | 5.53  | 2.96 | 8.92  | 3.71 | 7.31 | 5.29 | 4.3  |
| Harm. No.                                                                                        | 25     | 27     | 29    | 31    | 33    | 35    | 37    | 39   | 41    | 43   | 45   | 47   | 49   |
| Off                                                                                              | 0.53   | 0.94   | 0.62  | 0.54  | 1.08  | 0.47  | 0.45  | 0.58 | 0.54  | 0.24 | 0.27 | 0.39 | 0.21 |
| Idle                                                                                             | 4.12   | 5.18   | 6.61  | 4.89  | 7.58  | 3.98  | 2.61  | 3.9  | 1.29  | 1.28 | 1.91 | 0.94 | 0.36 |
| Video                                                                                            | 6.2    | 8.31   | 6.35  | 3.64  | 5.23  | 2.72  | 2.09  | 2.83 | 0.97  | 0.46 | 0.85 | 0.98 | 0.53 |
| CPU Arithmetic                                                                                   | 5.86   | 2.29   | 2.94  | 2.54  | 4.48  | 1.71  | 0.51  | 2.94 | 1.26  | 1.24 | 1.44 | 0.34 | 1.95 |
| GPU Rendering                                                                                    | 4.63   | 1.28   | 4.3   | 3.61  | 3.67  | 1.59  | 0.93  | 3.55 | 0.56  | 0.67 | 1.79 | 0.48 | 1.78 |
| Multi-Media CPU                                                                                  | 4.6    | 4.2    | 5.85  | 4.98  | 7.84  | 4.27  | 2.98  | 3.97 | 1.54  | 1.39 | 2.2  | 0.55 | 0.7  |
| Physical Disks                                                                                   | 5.2    | 3.07   | 4.93  | 3.96  | 8.2   | 4.17  | 3.19  | 4.7  | 0.96  | 1.24 | 1.93 | 0.9  | 1.34 |
| File System<br>Benchmark                                                                         | 4.76   | 6.35   | 6.26  | 5.16  | 7.34  | 2.94  | 2.2   | 2.81 | 1.11  | 1.82 | 1.77 | 1.03 | 0.95 |

TABLE III



60

Fig 2. Measured odd harmonics in two cases: Physical disc drive active and CPU loaded by arithmetic computations. The first harmonic is omitted for convenience

Comparisons will be done within the *i*-th set  $Q_{i,j}$ , j=1,2,...,m in order to find whether a change happens in time e.g. if there exists  $Q_{i,k} \neq Q_{i,l}$ . Here  $\neq$  means not similar, while  $k, l \mathbf{J} = \{1, 2, ..., m\}$ . If yes, a probability exist for the malicious attack was activated by the *i*-th testing software. The chosen  $Q_{i,j}$  will be compared with  $S_i$  to get final decision. If similar, i will be incremented by one and the procedure will be repeated.

If none of the strings goes below the chosen similarity measure the conclusion will be that no malicious software is running.

In the next, here we will demonstrate that there are some potentially feasible procedures to measure similarity. What we are presenting here is not a solution but a hint of why we will use these mathematical tools.

Suppose the correlation matrix is used as a basis for decision about the similarity between the responses of the fault-free and the faulty device. Since at least ten strings per state will be produced a correlation coefficient will be calculated for every string of the fault-free and the DUT for given state. That is important since no time а synchronization is preserved for the two measurements.

It is obvious that a proper search is to be done to find: (1) the most distant (less correlated) strings within a state and (2) the most distant strings of all. That would become the most probable hypothesis if the correlation coefficient is used as the basis for establishing the non-similarity of the behavior of the faulty and fault-free device. A threshold is to be foreseen in order to enable conclusion as to whether

the extracted minimal similarity may be pronounced a device. proof for the presence of malicious activity within the by a go-

device. In that way the testing process may be terminated by a go-no-go statement.

|                                   |            | TEEDI OI   | CDD OF THE |                   | 0101111011       |                    |                   |                          |
|-----------------------------------|------------|------------|------------|-------------------|------------------|--------------------|-------------------|--------------------------|
| ANN's Output→<br>Input<br>vector↓ | JJO        | Idle       | Video      | CPU<br>Arithmetic | GPU<br>Rendering | Multi-Media<br>CPU | Physical<br>Disks | File System<br>Benchmark |
| 1                                 | 0.94189    | -0.0082643 | -4.98E-05  | 0.0596502         | 0.0054563        | -2.69E-05          | 0.0025452         | 0.0012835                |
| -1                                | 1          | 0          | 0          | 0                 | 0                | 0                  | 0                 | 0                        |
| 2                                 | -0.100789  | 0.936809   | -6.30E-05  | 0.107029          | -0.0039056       | -4.57E-05          | 0.0353001         | 0.0301201                |
| -2                                | 0          | 1          | 0          | 0                 | 0                | 0                  | 0                 | 0                        |
| 2                                 | 0.0747284  | -0.0347075 | 1.00742    | -0.0946782        | 0.0368009        | 6.60E-06           | 0.0172143         | -0.0095049               |
| -3                                | 0          | 0          | 1          | 0                 | 0                | 0                  | 0                 | 0                        |
| 4                                 | 0.0530374  | -0.0051336 | -3.01E-05  | 0.94394           | 0.00599          | 4.07E-06           | -0.0031459        | 0.0039932                |
| -4                                | 0          | 0          | 0          | 1                 | 0                | 0                  | 0                 | 0                        |
| 5                                 | -0.0714551 | 0.141341   | 0.0002496  | 0.347383          | 0.694706         | 2.93E-05           | -0.0165517        | -0.0935344               |
| -5                                | 0          | 0          | 0          | 0                 | 1                | 0                  | 0                 | 0                        |
| C                                 | -0.0390391 | -0.068559  | -2.64E-05  | 0.0464038         | -0.0182126       | 0.994595           | 0.0357881         | 0.0513166                |
| -0                                | 0          | 0          | 0          | 0                 | 0                | 1                  | 0                 | 0                        |
| 7                                 | 0.0221675  | -0.0245939 | -7.76E-06  | -0.0287134        | 0.0235965        | -8.00E-07          | 1.01758           | -0.010466                |
| -/                                | 0          | 0          | 0          | 0                 | 0                | 0                  | 1                 | 0                        |
| 0                                 | 0.0524894  | -0.0178626 | -6.26E-05  | -0.0587603        | 0.0177179        | 1.40E-06           | 0.0010393         | 1.00386                  |
| -8                                | 0          | 0          | 0          | 0                 | 0                | 0                  | 0                 | 1                        |

TABLE VI Responses of the ANN to noisy input data

Note, the information about the state that produces minimal similarity, in some way, may be used as a diagnostic information since it identifies the state of the device where the malicious activity happens.

Other statistical measures of similarity will be not excluded from analysis

On the other side, if the harmonic spectrum is to be used for extracting the nonsimilarity measure, a proper method of hypothesis generation will be created.

For example, an ANN was trained to create a response recognizing which one of the sets of harmonics of Table 3 is present at its input. Its structure is depicted in Fig. 3. To simplify, for the proper vector of harmonics, the corresponding output of the ANN was forced to unity while the rest of the outputs were kept at zero. In other words, it was trained to recognize which software was running within the computer. Full success was achieved meaning, after training, the ANN was classifying perfectly.

To make the problem harder, we transformed Table 3 so that every entry was recalculated by the formula

$$x_{\text{new}} = x \cdot [1 + (2 \cdot rnd - 1) \cdot 0.025]$$
(3)

where *rnd* is a pseudo-random number with uniform distribution within the [0,1] segment. In other words a "noise" of amplitude (peak-to-peak) as large as 5% of the harmonic value was added as "measurement disturbance". Again, as can be seen from Table 4, excellent classification was obtained.





Finally, eight new sets of "harmonics" were created artificially by permutations within the rows in Table 3 and the newly created columns were used as excitation to the ANN. None succeeded to deceive the ANN. At the very moment we are not aware as to which of the concepts will be the best to be applied for detecting malicious activities in a computer. There is a probability for several of them to be used simultaneously i.e to apply Multiple Criteria Assessment of Discrete Alternatives (MCDA) [12]. That would lead to a creation of an integral measure of nonsimilarity while giving weight to different outcomes from different approaches (correlation of time series, correlation of harmonics, pattern recognition by ANNs, etc).

### VII. DESCRIPTION OF THE SYSTEM

The analysis system consists of the measurement subsystem and the software subsystem.

Given a fault-free device, measurements are performed to produce strings of supply current. There will be several states analyzed. The resulting data will be stored on the TC.

For testing a DUT, same measurements are repeated several times. Obtained time series strings are used by the software subsystem to reach a decision. The software within TC will enable interactive user-friendly interface with the test engineer. It will also allow for logging, documenting, reporting, storing, and post-processing of the resulting data.

# VIII. UNIQUE PROPERTIES OF THE SOLUTION

There are several aspects of our solution that we believe are novel and unique.

- 1. It is noninvasive. No change whatsoever happens in the software and hardware of the device.
- 2. The testing does not interfere with the regular activities of the device and therefore testing could potentially be performed on an active production device (e.g. network router).
- 3. The measurements, the measurement results, and the processing are taking place outside of the device so no tempering with the test from within the device is possible.
- 4. No library (or lists) of malware is to be produced and updated.
- 5. The simplicity of the concept will allow for improvements that are not conceivable at the moment.
- 6. The solution is entirely independent of device classes, CPU architectures, operating systems etc.

While the proposed solution consists of activities that are known in the testing practice of electronic systems, the uniqueness comes from the fact that these ideas were never utilized and tuned for detection of malicious activities in electronic devices.

#### References

- Nieto, O., et all., "Energy Profile of a Personal Computer", Proceedings of the LVI Conf. of ETRAN, Zlatibor, Serbia, June 2012, ISBN 978-86-80509-67-9, Paper EL3.3-1-4.
- [2] Hurst, S. L., "VLSI testing: digital and mixed analogue/digital techniques", IEE, London, 1998, ISBN 0852969015.
- [3] Milojković, J., and Litovski, V., "Dynamic Short-Term Forecasting of Electricity Load Using Feed-Forward ANNs", Int. J. of Engineering Intelligent Systems for Electrical Engineering and Communications, 2009, Vol. 17, No. 1, pp. 39-48.
- [4] Milojković, J., and Litovski, V., "Short Term Forecasting in Electronics", Int. J. of Electronics, Vol. 98, No. 2, 2011, pp. 161-172.
- [5] Milenković, S., Obradović, Z., and Litovski, V., "Annealing Based Dynamic Learning in Second-Order Neural Networks", Int. Conf. on Neural Networks, ICNN '96, Washington, D.C., USA, 3.-6. June 1996, pp. 458-463.
- [6] Litovski V., Andrejević M., and Zwolinski M., "Analogue Electronic Circuit Diagnosis Based on ANNs", Microelectronics Reliability, Vol. 46, No. 8, August 2006, pp. 1382-1391.
- [7] Andrejević-Stošović, M., Milovanović, D., and Litovski, V., "Hierarchical Approach to Diagnosis of Mixed-mode Circuits Using Artificial Neural Networks", Neural Network World, Vol. 21, No. 2, 2011, pp. 153-168.
- [8] Sokolović, M., Litovski, V., Zwolinski, M., "New Concepts of Worst Case Delay and Yield Estimation in Asynchronous VLSI Circuits", Microelectronics Reliability, 2009, Vol. 49, No. 2, pp. 186-198.
- [9] Lhermitte, S., et all., "A comparison of time series similarity measures for classification and change detection of ecosystem dynamics", Remote Sensing of Environment, Vol. 115, No. 12, 15 December 2011, pp. 3129–3152.
- [10] Terzija, V., Stanojević, V.: "STLS Algorithm for Power-Quality Indices Estimation", IEEE Transactions on Power Delivery, April 2008, Vol. 24, No. 2, pp. 544-552.
- [11] Goertzel, G., "An Algorithm for the Evaluation of Finite Trigonometric Series", The American Mathematical Monthly, January 1958, No. 1, Vol. 65, pp 34-35.
- [12] Makowski, M., and Granat, J., "Multicriteria Analysis of Large Sets of Alternatives ", 21st CSM Workshop, IIASA, Laxenburg, Austria, August 27–29, 2007

# Glitch Free Clock Switching Techniques in Modern Microcontrollers

Borisav Jovanović, Milunka Damnjanović

*Abstract* - Multi-frequency clock signals are being widely used in chips, especially in the communications area. These clock frequencies can be totally unrelated or they may be multiples of each other. In either case, there is a chance of generating a glitch on the clock line at the time when switch changes. The paper presents the implementation of glitch free clock switching techniques applied in the design of 8051 microcontroller.

Keywords - Multi-frequency clock signals, clock switching, microcontroller

#### I. INTRODUCTION

Multi-frequency clock signals are being widely used in digital circuits, especially in the communications area. The frequencies of these clock signals can be multiples of each other or totally unrelated. Modern microprocessors also utilize multiple clock signals. They are able to work under different load conditions, and operate at several clock frequencies. When large amount of data processing is required from microprocessor, its speed is set to the maximum level. In some other operating conditions, for example, when microprocessors are embedded in wireless sensor nodes, the power consumption is extremely reduced. The clock frequency impacts dynamic component of power dissipation and the decrease of clock frequency leads to the reduction of total power consumption.

In multi-clock signal digital systems there is a chance of generating a glitch or chopped signal on the clock line at the time when clock signal is changed. Special attention has to be paid to avoid the timing problems. The paper presents the implementation of glitch free clock switching techniques applied in the design of a microcontroller.

#### II. GLITCH-FREE CLOCK SWITCH CIRCUIT

The simplest clock switch is the multiplexer circuit (Fig.1). The multiplexer is comprised of AND, OR and INVERTER logical gates. The switch takes two clock signal sources at inputs (signals CLKA and CLKB). When the signal SEL value changes, the multiplexer alters the clock source input to the output. The frequencies of clock signals CLKA and CLKB can be are multiples of each other, or they may be not related in any way. The select

Borisav Jovanović and Milunka Damnjanović are with the Department of Electronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia, E-mail: {borisav.jovanovic, milunka.damnjanovic} @elfak.ni.ac.rs. control signal SEL is usually generated by some sequential circuit, driven by either CLKA or CLKB.



Fig. 1. The clock switch based on multiplexer circuit

Unfortunately, the switch may generate chopped clock signal or a glitch at the output CLK (Fig.2). The clock switching which occurs during output clock's high state has to be avoided, regardless of the frequency values or phase relationship of input clock signals.



Fig. 2. The signal waveforms of multiplexer circuit

A clock switch circuit that prevents glitch generation at the output is presented in Fig. 3. [1] The circuit can be used when frequencies of input clock signals are multiples of each other. The input clock signals can be generated by some clock divider circuit.



Fig. 3. Glitch-free clock switch circuit

Negative edge triggered D flip-flops are inserted in the selection path for each of the clock sources (Fig.3). D flipflops are used to store the selection signals EnA and EnB. New values EnA and EnB are stored on negative edges of clock signals CLKB and CLKA respectively. This protects against glitches at the output CLK.

The next clock source is selected only after previous clock is deselected. Actually, the switch has to wait for deselection of the current clock before starting the propagation of the next clock source. This guarantees that no changes occur at the output while either of the clocks is at high level, thus avoiding the chopped signal at output.

\_\_\_\_\_

| CLKA |     |          |   |
|------|-----|----------|---|
| CLKB | ллл | <u> </u> |   |
| SEL  | 0   | 1        | 0 |
| ENA  | 1   |          | 1 |
| ENB_ | 0   | 1        | 0 |
| CLK  |     |          |   |

Fig. 4. The signal waveforms for the signals of glitch free-clock switch circuit

The Fig.4 presents the waveforms of clock switch circuit signals. The transition of the signal SEL from 0 to 1 first stops the propagation of CLKA to the output CLK. This happens at the proceeding falling edge of CLKA. At following negative edge of CLKB, the signal ENA is reset and the propagation of CLKB is started. Now there is not any glitch or chopped signal at the clock switch output CLK.

# III. THE IMPLEMENTATION OF GLITCH-FREE CLOCK SWITCH IN 8051 MICROCONTROLLER

#### A. The microcontroller

The microcontroller block (MCU) has standard 8051 instruction set that contains 255 different instructions. The 8051 complex instruction set is popular and widely supported by many software development tools [2, 3].

The microcontroller core is quite fast. The one-byte instructions are executed in only two clock cycles. For comparison, the execution period of one-byte instructions in industrial-standard 8051 microcontroller is 12 clock cycles.

The main structure of proposed microcontroller block consists of core, memory blocks, the block for programming and initialization and peripheral units.

The MCU core performs fetching, decoding and executing of instructions and consists of the control logic block, arithmetical-logical unit (ALU) and Special Function Register (SFR) control logic.

The peripherals are comprised of three digital

input/output parallel ports (P0, P1 and P2), several communication modules - one asynchronous universal receiver/transmitter (USART) and I2C interface. Also, three standard timer/counter circuits are present.

The memory organization is similar to that of the industry standard 8051 microcontroller. The main memory areas are:

- program memory (on-chip 8kB SRAM block),
- external data memory XRAM (physically consisting of on-chip 2kB SRAM block),
- internal data memory IRAM (comprising of onchip 256 Internal Dual port RAM and Special Function Register block).

All SRAM memory blocks are physically located on the chip. The implemented MCU does not have non-volatile memory for program code storage. Instead, the MCU utilizes on-chip SRAM memory and an external serial 24LC64 EEPROM chip. Every time, after the reset state, the program memory is automatically loaded from external EEPROM chip into the 8kB SRAM program memory.

The microcontroller was implemented using commercial 65nm digital standard cell library and Cadence tool suite [4].

The power consumption of microcontroller can be divided into two power components: dynamic and static consumption.

The static power is caused by presence of leakage currents in MOS transistors. In new technologies the amount of leakage is rapidly increased compared to dynamic current component [5]. New technologies proposed techniques that solve the leakage problem. The design layout is divided into different power domains with separated power and ground lines [6]. The power domains may operate at different voltage supply values. Also, special transistors are inserted into domains to switch off the parts of the chip which are currently inactive [7].

Dynamic power consumption depends on operating frequency and voltage supply level. Since the implemented microcontroller works at fixed voltage supply level of 1.2V, the dynamic power consumption is reduced by utilization of clock gating techniques and decreasing the operating frequency. The proposed MCU is able to work under different load conditions, and several operating frequencies are at disposal. When large amount of processing is required, the speed is set to the maximum level of 60MHz. When processing load is low, the lower clock frequencies are used.

#### B. The glitch free clock divider circuit

To reduce the dynamic power in applications that do not require intensive data processing, the MCU operating frequency is reduced. The MCU incorporates clock divider circuit with six output signals (Fig. 5). The following clock signals are used: 60MHz, 30MHz, 15MHz, 7.5MHz, 3.75MHz and 1.875MHz.



Fig. 5. The clock divider circuit implemented within the microcontroller

The operating frequency is changed instantly during program code execution. The operating speed is simply programmed by writing the new value into PMSR register. The PMSR register is one of the registers in Special Function Register set and has hexadecimal address 0x8E. The three least-significant bits of PMSR register select the frequency of MCU clock signal. The Table I gives the relation between the PMSR content and selected frequency.

TABLE I PMSR register content for selecting the clock frequency

| VAL       | OL       |
|-----------|----------|
| PMSR(2:0) | CLKA(i)  |
| "000"     | 60MHz    |
| "001"     | 30MHz    |
| "010"     | 15MHz    |
| "011"     | 7.5MHz   |
| "100"     | 3.75MHz  |
| "101"     | 1.875MHz |

A glitch on the clock line is hazardous to the whole system, as it could be interpreted as a capture clock edge by some registers while missed by others. The circuit implementing the glitch free clock signal selection is given in Fig. 6.



Fig. 6. The circuit for glitch-free clock signal selection implemented in microcontroller

The circuit takes at inputs the clock divider output signals CLKA(i) (i=0,...,5), and produces glitch free clock signal CLKC. The signal CLKC is fed to microcontroller clock input. The input Sel(2:0) represents the 3-bit content

of PMSR register, selecting one of the clock frequencies.

The circuit consists of six identical clock switch cells (i=0,...,5) implementing the selection path for six clock sources. Each cell enables the propagation of one clock source CLKA(i) to the switch output. The cell produces internal CLKB(i) which is fed further to the OR gate input.

The cell contains a D flip-flop which stores the clock enable signals En(i). The value En(i) is changed at negative edges of the clock signal CLKA(i). The signal En(i) is set when input signal Sel(2:0) is equal to number i and the corresponding enable signals of other cells En(j), j=0,...,5are in the reset state.

The Fig.7 shows the glitch free clock signal CLKC, when operating speed is changed by writing different values into PMSR register.

| SEL(2:0) 0 X 1 X | 2)     | ( | 3 | 4 |
|------------------|--------|---|---|---|
|                  | ו רי ו |   |   |   |

Fig. 7. The signal waveforms of glitch free-clock switch circuit implemented within the microcontroller

#### C. The clock gating

Clock tree power dissipation is significant component of MCU's power consumption because the clock is fed to all sequential standard cells, and the clock signal switches every cycle. Clock gating is an efficient technique for dynamic power reduction and it was often used during the microcontroller design.

To avoid glitches on clock gate output signals special cells were used. The standard cell library TCBN65LP [4], in which the microcontroller is implemented, offers two different standard cells which are used as gated clock latches:

- CKLHQ Negative-edge gated clock latch with Q output only
- CKLNQ Positive-edge gated clock latch with Q output Only

Both types of cells have several versions, with different drive strengths. For example, the cell CKLHQ has following variants: CKLHQD1, CKLHQD2, CKLHQD4, CKLHQD6BWP7T, CKLHQD8 (with the strongest load drive).

The clock gating cells are included in RTL descriptions of microcontroller parts by simply making instances of these cells in VHDL code:

Cell\_lab: **CKLHQ** port map (TE=>sig1, E=>sig2, CPN=>CLK, Q=>sig3);

Fig. 8. The specification of clock gating cell in VHDL code

The synthesis tool recognized the clock gating cells in VHDL descriptions and changed the cells with cells that have appropriate drive strengths. Unfortunately, the standard cell library doesn't offer the clock gating lathes which have the reset input. In some MCU's components, these cells were needed, so, custom made gating cells were used, which schematic is given in Fig. 9.

The clock tree was synthesized by CTS (Clock Tree Synthesis) SoC Encounter tool [8]. The clock tree generation process was directed by information written in timing constraint files. The constraints included maximum delays and skews on clock signals. The CTS tool automatically found the number of clock tree levels and balanced the clock phase delays with appropriately sized clock buffers. Besides, CTS performed timing analysis and optimization through clock gating logic.



Fig.9. Custom made clock gate made of standard cells



Fig.10. The signal waveforms for the signals of the clock gate circuit

After the layout was generated, the logical verification of final netlist was performed. The exact delays on signals have been extracted from layout and used together with standard cells timing information provided by timing libraries. The timing problems (setup-hold warnings and the logical errors) concerned with clock tree generation and clock gates were not present in simulations.

#### IV. CONCLUSION

The implemented microcontroller operates at several clock frequencies and is able to work under different load conditions.

The architecture of the microcontroller clock switch block is considered. The proposed circuit implements glitch free clock signal selection for safe MCU operation. The MCU provides six clock frequencies: 60MHz, 30MHz, 15MHz, 7.5MHz, 3.75MHz and 1.875MHz. The operating speed is simply programmed by writing the special function register.

Clock gating was often used during the microcontroller design. To avoid glitches on clock gate output signals special cells were used. The logical verification of final layout netlists proved the absence of timing problems.

#### ACKNOWLEDGEMENT

Results presented in this paper are part of achievements obtained within the project TR32004 funded by the Serbian Ministry of Science and Technology Development.

#### References

- [1] Mahmud R., "Techniques to make clock switching glitch free", EETimes, June 30, 2003.
- [2] KEIL C compiler and development tool for C51 microcontrollers, http://www.keil.com/c51/ devproc.asp
- [3] SDCC 8051 compiler documentation, http://sdcc.sourceforge.net/
- [4] TSMC 65nm LP Standard Cell Libraries TCBN65LP http://www.europractice-ic.com/libraries TSMC.php
- [5] Bipul, P., Agarwal, A., Roy, K., "Low-Power Design Techniques for Scaled Technologies", Integration, The VLSI Journal 2006, pp.64–89
- [6] Katkoori, S., Roy, S., Ranganathan, N. "A Framework for Power-Gating Functional Units in Embedded Microprocessors", IEEE Transactions on VLSI Systems, 2009, Vol.17, N.11, pp.1640-164
- [7] Keating, M., Flynn, D., Aitken, R., Gibbons, A., Shi, K., "Low Power Methodology Manual", Springer, 2007
- [8] SoC Encounter User Guide SoceUG.pdf, Cadence documentation

# Testing an SCA hardened combinational standard cell preliminary considerations

Milena Stanojlović, Vančo Litovski and Predrag Petković

*Abstract* - This paper describes testing of the NSDDL AND cell, being part of NSDDL (No Short-circuit current Dynamic Differential Logic) side-channel-attack-resistant library. The simulation results illustrate a vulnerability of AND logic cell in the presence of a defect. Fault dictionary will be created based on repetitive simulation preformed on the circuit level description of the AND cell with faults inserted one by one. Only short-circuit faults will be considered. The cells are designed in CMOS TSMC035 technology using Mentor Graphics design tools.

Keywords - crypto-system, SCA, testing, defect.

#### I. INTRODUCTION

The significance of information encrypted within running messages provokes adversaries to try to disclose their contents. Any illegal attempt to access encrypted content is treated as an attack on the cryptographic system. A common way for unauthorized disclosure of encrypted information relays on attempts for finding combinations that allow encryption key detection. Complex cryptographic algorithms are designed to discourage the attacker, or to impede the breaking the key by searching for all possible combinations in real time. Additional information about the behaviour of an electronic crypto-system can significantly reduce the number of combinations needed to explore a cipher [1]. Collecting such information is known as the Side Channel Attack - SCA. The most popular methods for SCA relay on monitoring of dynamics consumption at the electronic crypto-system. The most effective are SPA (Simple Power Analysis), DPA (Differential Power Analysis) and EMA (Electromagnetic Analysis) [2, 3].

The supply current  $(I_{DD})$  is a very important additional source of information about the behaviour of cryptographic systems. An abrupt change of  $I_{DD}$  in a CMOS digital circuit occurs only during transition between logic states. When changing from 0 to 1, the output capacitances are charged to  $V_{DD}$  through the PMOS network. As the state changes from 1 to 0, capacitances are discharge to ground. In addition, during transition some short-circuit current flows

Milena Stanojlović is with with ICAT, Vojvode Mišića 58/2, 18000 Niš, and also with LEDA laboratory Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, Serbia E-mail: (milena@venus.elfak.ni.ac.rs)

Vančo Litovski and Predrag Petković are with Department of Electronics, Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia E-mail: (vanco.litovski@elfak.ni.ac.rs) E-mail: (predrag.petkovic@elfak.ni.ac.rs) when PMOS and NMOS transistors are in on-state simultaneously. Attackers are able to provide stimulus data, but cannot access the points in which they could register the response. The only source of information about the behavior of a circuit is activity expressed through the change of the supply current. Obviously, the information of circuit consumption is correlated with the circuit activity. In the struggle against these attacks different cryptographic methods are used as hardware solutions.

We chose the NSDDL (No Short-circuit current Dynamic Differential Logic) [4] as a cryptographic method in hardware for data protection. The method is based on a modification TDPL (Three-Phase Dual-Rail Pre-Charge Logic) approach which introduces a third phase of work, during which all the capacitors in the circuit are empty [5]. An important novelty in NSDDL method is its immunity on unbalanced load of the true and false outputs. In addition, the method requires only one new cell that is combined with standard logic cells.

To our knowledge the subject of test sequence synthesis and generally, testing of NSDDL based circuits was not considered in the literature and in that sense these proceedings are kind of pioneering work. Namely, the NSDDL method being based on (anti-) symmetry of two circuits named TRUE and FALSE (as will be explained later on in this paper) is by nature susceptible to faults that disturb the symmetry. From that point of view testing such circuits, or better to say, test signal synthesis should be a relatively straightforward task. It is the goal of this paper to propose a procedure for test signal synthesis and to give the first answers as to how easy the testability of this kind of circuits is. This is to be considered as a continuation of our research in testing of NSDDL circuit since in [6] fault simulation of a sequential circuit was performed.

For demonstration of the procedure we usually implement in such situations [7], in this paper, we will consider testing of one of the simplest NSDDL cells - the AND circuit. In fact, after insertion of short-circuit defects in the fault free circuit, the output signal and the proper NSD value (calculated using supply current) for each defect for certain combinations of input signals will be monitored by simulation. Namely, besides examining the logic function of the circuit, it is also very important to compare the supply currents of the faulty and fault free circuits. When defect is present in the circuit, it is very likely that it will be mapped in to change of mentioned supply current [6]. The number of simulations will depend on the number of defects which are tested.

Simulation results were obtained using *ELDO* simulator of *Mentor Graphics Design Architect* environment. To get the proper circuit parameters for simulation, layout design was performed first and post-layout parameter extraction took place. To draw the layout *IC studio Mentor Graphics tools* was used.

The subsequent section reviews the core of NSDDL method. The third section explores design methodology and SCA resistivity of AND NSDDL cell. The simulation results for AND logic cell, in the presence of a defect, is described in the fourth section. Fault dictionary is created in order to allow for verification of the input signal for testing purposes.

#### II. NSDDL METHOD

Cells resistant to SCA are based on the idea that each combination of input signals results in the same power consumption. This is possible when every logic cell has a counterpart that will react complementary. Therefore every functional cell has two outputs denoted as *true* and *false*. The hardware is doubled, but the effect of masking the true function of the cell is gained.

NSDDL method is based on the three phase clocking. The first phase named *pre-charge* is aimed to drive all outputs (true and false) of all logic cells to go to high logic level. In the second phase, known as *evaluation* phase true output takes desired value and false output takes the complementary value. The third phase is named *discharged* because all outputs go to the low logic level.

The advantage of this method compared to other popular solutions, like WDDL [8], is its immunity to imbalance loads at true and false output. This is achieved by using a dynamic NOR circuit (DNOR) which minimizes the impact of short-circuit currents in the CMOS circuit. It is integral part of the control logic and NSDDL cells. Figure 1 illustrates the circuitry of DNOR cell.



Fig. 2 illustrates waveforms of control signals. During the pre-charge phase signals PRE=0 and DIS=0, transistor

M1 is *on*, while the other transistors are *off*. The output goes to logic 1, regardless of the input signal IN. The *evaluation* phase begins when signal PRE=1 And DIS=0. Then M1 and M4 turn off, M2 is *on*, and the input signal IN controls the state of the transistor M3. If the signal IN=0, M3 is *off*, so that the output remains at logical 1. If IN=1, M3 and M2 are *on* and output switches to 0. It is obvious that the output becomes an inverting function of the input signal. Discharging phase occurs when PRE=1 and DIS=1. Therefore M3 is *off* and M4 is *on* and output goes to low logic level regardless to input signal.



Fig. 2 Time waveforms of control signals for DNOR cell

### III. NSDDL AND/NAND/OR/NOR CELL

This section recalls to the results obtained for NSDDL AND/NAND/OR/NOR cells [9-10]. All functions are implemented using usual logic circuits with negative logic (NAND and NOR) which can be easily implemented in CMOS technology. Using de Morgan rules it is easy to see that simple permutation of input signals (*A*, notA, *B*, notB) provides four different logic functions with the same hardware. Therefore this structure is named NSDDL AND/NAND/OR/NOR SCA resistant cell.

DNOR circuit represents basic element for all SCA resistant cells in NSDDL technique. Prime role of this circuit is to decrease short-circuit current in CMOS circuit Moreover, it provides inverting function when transforming from standard to NSDDL logic.

Block diagram of NSDDL OR, SCA resistant cells are presented in Figure 3. According to the fact that NSDDL OR and NOR cells explore mutually complementary function, it is obvious that they can be realized using the same hardware. The only difference makes the meaning of the true and the false output.



Fig. 3. Block scheme of NSDDL OR SCA resistance cell



Fig. 4. Block scheme of NSDDL OR SCA resistance cell

Figure 4 illustrate NSDDL AND and NAND cells. NAND function occurs when the true and the false output replace their positions, under the same conditions.

In order to estimate SCA resistance we consider the energies needed for output state transition during different combinations of input signals. As reference we use standard AND, NAND, OR and NOR cells and compare behavior of standard and NSDDL cell. For standard cells one can expect strong correlation between energy required for particular transition and combination of input signals. In particular any neutral event requires minimal energy while rise transition at the output needs more current to charge the output capacitance. NSDDL cells are designed with intention to mask cell operation regarding  $I_{DD}$ . Therefore they should provide minimal correlation between stimulus signals and  $I_{DD}$ . Table I systematizes results of comparison.

 TABLE I

 CHARACTERISTICS COMPARISON OF CLASSIC AND NSDDL CELLS

| 1            | 2            | 3                         | 4                          | 5                        | 6                         | 7                          |
|--------------|--------------|---------------------------|----------------------------|--------------------------|---------------------------|----------------------------|
| A            | В            | E <sub>ANDc</sub><br>[pJ] | E <sub>NANDc</sub><br>[pJ] | E <sub>ORc</sub><br>[pJ] | E <sub>NORc</sub><br>[pJ] | E <sub>NSDDL</sub><br>[pJ] |
| 0            | 1            | 0.05                      | 0.05                       | -0.49                    | -0.46                     | -2.80                      |
| 0            | →            | -0.05                     | -0.05                      | -0.674                   | -0.47                     | -2.77                      |
| 1            | 0            | 0.05                      | 0.05                       | -0.50                    | -0.50                     | -2.77                      |
| ↓            | 0            | -0.05                     | -0.05                      | -0.76                    | -0.55                     | -2.74                      |
| 1            | 1            | -0.72                     | -0.69                      | -0.44                    | -0.43                     | -2.75                      |
| ↓            | 1            | -0.86                     | -0.65                      | -0.05                    | -0.05                     | -2.82                      |
| 1            | 1            | -0.65                     | -0.62                      | 0.05                     | 0.05                      | -2.77                      |
| 1            | $\downarrow$ | -0.93                     | -0.73                      | -0.007                   | -0.007                    | -2.79                      |
| 1            | ↑            | -0.69                     | -0.66                      | 0.007                    | 0.007                     | -2.74                      |
| $\downarrow$ | $\downarrow$ | -0.97                     | -0.76                      | -0.71                    | -0.52                     | -2.76                      |
| Eav          | [J]          | -0.48                     | -0.41                      | -0.36                    | -0.30                     | -2.77                      |
| δΕ           | [%]          | 210.2                     | 196.98                     | 222.05                   | 202.67                    | 2.81                       |
| σ            | fJ]          | 405.4                     | 337.7                      | 310.3                    | 243.1                     | 24.31                      |
| NSL          | <b>P[%]</b>  | 83.91                     | 82.23                      | 85.64                    | 82.59                     | 0.87                       |

Columns 1 and 2 indicate input combinations. Symbols " $\uparrow$ " and " $\downarrow$ " denote the rise and fall transition, respectively. Columns 3, 4, 5 and 6 present results obtained for standard AND, NAND, OR and NOR cells, respectively, while column 7 refers to NSDDL cell.

Energy consumption is expressed as integral in time of instantaneous power  $(I_{DD} \cdot V_{DD})$  during one cycle of input signal change. For AND, NAND, OR and NOR this cycle lasts as all three operational phases needed for NSDDL

cell. In order to get better insight into the behavior of every cell we derived from the simulation results the following parameters:

- average energy  $(E_{av})$
- relative difference in respect to  $E_{av}(\delta)$
- standard deviation ( $\sigma$ )
- normalized standard deviation in respect to *E<sub>av</sub>* (*NSD*).

As a measure of SCA resistance we consider normalized standard deviation. This parameter indicates that AND/NAND/OR/NOR NSDDL cell is immune to SCA using DPA.

Figure 5 illustrates layout of SCA resistant AND/NAND/OR/NOR cell. Layout of NSDDL cells that perform particular logic function AND, NAND, OR and NOR cells differs only regarding the order of input and output ports which form desired functions. By rule of symmetry, true and false parts of the circuit are mirrored.



Fig. 5 Layout of SCA resistant NSDDL AND/NAND/OR/NOR cell

# IV. TESTING OF NSDDL AND CELL

To create a fault dictionary one is supposed to define the set of defects that are to be tested first. After that, the defect should be inserted into the circuit, one at a time, in order to analyze the effect of defect propagation. Two categories of defects are sought: catastrophic, that includes shorts and opens, and soft faults where the delay faults belong. Here only one sub-category will be considered the shorts between the transistor terminals. To get the response of the faulty circuit, namely to get the fault-effect, one has to perform electrical simulation of the faulty circuit. Of course, a test signal is to be established beforehand that is supposed to be capable to expose the fault-effect if it is present into the response(s) of the faulty circuit.



Fig.6.Block diagram of NSDDL AND cell

*True* and *False* blocks are emphasized with dashed rectangles in Figure 6 and their outputs are denoted as OT and OF, respectively. Observing this figure, one can see that these blocks have complementary structure where OT depends on At and Bt, while OF is function of Af and Bf. Figure 7.a shows an SCA unprotected NAND cell as a generic block while Figure 7.b shows the schematic (taken from the TSMC035u library [11]) with marked defects.

the TSMC035u library) with marked defects. Transistors are denoted with  $P_{i_x}F_{xy}$  or  $N_{j_x}F_{xy}$ , where P and N represent type of the transistor. Counters marked as *i*=1-4, and *j*=1-4, represents index of PMOS and NMOS transistor, respectively.  $F_{xy}$  denotes a short-circuit between the *x* and *y* terminal of the proper transistor. Therefore *xy* can take values from the set {GD, GS, SD}, where GD stands for gate-drain, GS for gate-source, and SD source-drain.



Fig. 7. Standard NAND cell a) generic representation b) standard CMOS realization with marked defects

 TABLE II

 COVERAGE OF DEFECTS FOR TRUE SUB CIRCUITS

| Type of<br>defects              |              | Signal values |               |   |   |               |   |               |   |   | NSD <sub>NSDDL</sub><br>and |   |       |
|---------------------------------|--------------|---------------|---------------|---|---|---------------|---|---------------|---|---|-----------------------------|---|-------|
| At                              | 0            | 0             | 1             | 0 | 1 | 0             | 1 | 1             | 1 | 0 | 0                           | 0 | NA    |
| Bt                              | 1            | 0             | 0             | 0 | 1 | 1             | 1 | 0             | 1 | 0 | 0                           | 0 | NA    |
| Fault<br>free OT                | 0            | 0             | 0             | 0 | 1 | 0             | 1 | 0             | 1 | 0 | 0                           | 0 | 0.87  |
| P <sub>1</sub> _F <sub>GD</sub> | $\downarrow$ | 0             | 0             | 0 | 0 | $\rightarrow$ | 0 | 0             | 0 | 0 | 0                           | 0 | 31.64 |
| P <sub>1</sub> _F <sub>GS</sub> | 1            | 0             | 0             | 0 | 1 | 1             | 1 | 0             | 1 | 0 | 0                           | 0 | 54.11 |
| $P_1_F_{SD}$                    | 0            | 0             | 0             | 0 | 0 | 0             | 0 | 0             | 0 | 0 | 0                           | 0 | 53.64 |
| $P_2_F_{GD}$                    | 0            | 0             | $\rightarrow$ | 0 | 0 | 0             | 0 | $\rightarrow$ | 0 | 0 | 0                           | 0 | 29.54 |
| $P_2_F_{GS}$                    | 0            | 0             | 1             | 0 | 1 | 0             | 1 | 1             | 1 | 0 | 0                           | 0 | 54.11 |
| $P_2_F_{SD}$                    | 0            | 0             | 0             | 0 | 0 | 0             | 0 | 0             | 0 | 0 | 0                           | 0 | 53.64 |
| N <sub>1</sub> _F <sub>GD</sub> | $\downarrow$ | 0             | 0             | 0 | 0 | $\rightarrow$ | 0 | 0             | 0 | 0 | 0                           | 0 | 31.67 |
| $N_{1}$ $F_{GS}$                | 0            | 0             | 0             | 0 | 0 | 0             | 0 | 0             | 0 | 0 | 0                           | 0 | 0.48  |
| $N_{1}F_{SD}$                   | $\downarrow$ | 0             | 0             | 0 | 1 | $\rightarrow$ | 1 | 0             | 1 | 0 | 0                           | 0 | 151.0 |
| $N_{2}F_{GD}$                   | 0            | 0             | 0             | 0 | 0 | 0             | 0 | 0             | 0 | 0 | 0                           | 0 | 37.28 |
| $N_{2}F_{GS}$                   | 0            | 0             | 0             | 0 | 0 | 0             | 0 | 0             | 0 | 0 | 0                           | 0 | 0.50  |
| $N_{2}F_{SD}$                   | 0            | 0             | ↓             | 0 | 1 | 0             | 1 | ↓             | 1 | 0 | 0                           | 0 | 151.2 |

The same analogy is applied for NOR cell, which is presented to Figures 8.a and 8.b (schematic also taken from



Fig. 8. Standard NOR cell a) generic representation b) standard CMOS realization with marked defects

 TABLE III

 COVERAGE OF DEFECTS FOR FALSE SUB CIRCUITS

| Type of<br>defects              |   | Signal values |   |   |   |   |              |   |              |   | NSD <sub>NSDDL</sub><br>and |   |        |
|---------------------------------|---|---------------|---|---|---|---|--------------|---|--------------|---|-----------------------------|---|--------|
| Af                              | 1 | 1             | 0 | 1 | 0 | 1 | 0            | 0 | 0            | 1 | 1                           | 1 | NA     |
| Bf                              | 0 | 1             | 1 | 1 | 0 | 0 | 0            | 1 | 0            | 1 | 1                           | 1 | NA     |
| Fault<br>free OF                | 1 | 1             | 1 | 1 | 0 | 1 | 0            | 1 | 0            | 1 | 1                           | 1 | 0.87   |
| P <sub>3</sub> _F <sub>GD</sub> | 1 | 1             | 1 | 1 | 1 | 1 | 1            | 1 | 1            | 1 | 1                           | 1 | 42.99  |
| $P_3 F_{GS}$                    | 1 | 1             | 1 | 1 | 1 | 1 | 1            | 1 | 1            | 1 | 1                           | 1 | 54.38  |
| P <sub>3</sub> F <sub>SD</sub>  | 0 | 1             | 1 | 1 | 0 | 0 | 0            | 1 | 0            | 1 | 1                           | 1 | 145.25 |
| P <sub>4</sub> _F <sub>GD</sub> | 1 | 1             | 0 | 1 | 1 | 1 | 1            | 0 | 1            | 1 | 1                           | 1 | 22.77  |
| P <sub>4</sub> _F <sub>GS</sub> | 1 | 1             | 1 | 1 | 1 | 1 | 1            | 1 | 1            | 1 | 1                           | 1 | 57.39  |
| P <sub>4</sub> _F <sub>SD</sub> | 1 | 1             | 0 | 1 | 0 | 1 | 0            | 0 | 0            | 1 | 1                           | 1 | 145.06 |
| N <sub>3</sub> _F <sub>GD</sub> | 1 | 1             | 0 | 1 | 1 | 1 | 1            | 0 | 1            | 1 | 1                           | 1 | 22.77  |
| N <sub>3</sub> _F <sub>GS</sub> | 1 | 1             | 0 | 1 | 0 | 1 | 0            | 0 | 0            | 1 | 1                           | 1 | 0.44   |
| $N_{3}$ $F_{SD}$                | 1 | 1             | 1 | 1 | 1 | 1 | 1            | 1 | 1            | 1 | 1                           | 1 | 52.86  |
| $N_{4}$ $F_{GD}$                | 0 | 1             | 1 | 1 | ↓ | 0 | $\downarrow$ | 1 | $\downarrow$ | 1 | 1                           | 1 | 24.18  |
| $N_{4}F_{GS}$                   | 0 | 1             | 1 | 1 | 0 | 0 | 0            | 1 | 0            | 1 | 1                           | 1 | 0.44   |
| $N_{4}$ $F_{SD}$                | 1 | 1             | 1 | 1 | 1 | 1 | 1            | 1 | 1            | 1 | 1                           | 1 | 52.86  |

Effect of every defect is firstly observed with respect to a logic function of the circuit. When logic function is



Figure 10. Responses of the FALSE circuit in its fault-free and faulty version (fault P4SD)

violated it can be considered that defect is detected. Table II give results for *True* sub-circuit while Table III for *False* sub circuit. The symbol " $\downarrow$ " denotes the fall-transition. Observing results given in tables II and III one can see that all defects in the circuit were detected in this way. Since operation of the circuit is very specific, logic function is observed during *EVALUATION* phase for fault free and faulty circuits under the same input conditions.

To illustrate, Figure 9. and Figure 10 depict the waveforms of the important signals in the TRUE and FALSAE circuit respectively, in the fault free and a faulty

case (as indicated). These kinds of figures were created for every row in Table II and Table III.

Testing based on the supply current is an excellent supplement to the testing of logic functions of a circuit. As discussed above, *NSD* parameter directly depends on the  $I_{DD}$  and for that reason, this parameter is used as a second indicator. It can be seen form Table II and Table III that both criteria indicated the presence of a defect in a circuit for any simulated case. This means that defect coverage is 100% by the test signal given in the first two rows of the tables. This confirms that rough destruction (catastrophic

fault presence) of the NSDDL's circuit symmetry has apparent influence to its response. That is important for testing but also for evaluating its main function. Namely, in the presence of a fault the circuit is not so effective in data protection.

### V. CONCLUSION

The NSDDL method design method for side channel attack hardening of digital electronic circuits is characterized by the implementation of duplicated hardware that provides true and false output. The false output has the same function as inverted true output. The basic idea is to mask the correlation between the supply current and the activity of the cell. This is possible to obtain if input signals are doubled.

For testing this cell two criteria were adopted: logic function verification, and IDDQ testing performed by calculating the *NSD* parameter. Twenty four simulations were performed in order to make the appropriate fault dictionary for defects of short-circuit type. After completing the test synthesis procedure for a simple AND gate one may conclude that expected results were obtained. Namely, both criteria give excellent coverage of defects. All twenty four defects were detected in either case. In fact the symmetry being violated by insertion of a fault, the fault effect is immediately visible at the output.

This conclusion, however, is still valid for simple circuits as the AND NSDDL is. It is our intention in future to perform in-depth analysis of detectability and observability of the catastrophic faults (including openfaults) in much more complex combinational NSDDL circuit.

#### ACKNOWLEDGEMENT

This work was supported by The Serbian Ministry of education and science within the project TR 32004.

#### References

- [1] Koc, Cetin Kaya (Ed.) *Cryptographic Engineering*, Springer, 2009.
- [2] Petković P., Stanojlović M. and Litovski V. "Design

of side-channel-attack resistive criptographic ASICS", Forum BISEC 2010, Zbornik radova druge konferencija o bezbednosti informacionih sistema, Beograd, Srbija, Maj 2010, pp 22-27.

- [3] Stanojlović M. and Petković P., "Hardware based strategies against side-channel-attack implemented in WDDL", Electronics, Vol. 14, No. 1, Banja Luka, June, 2010, pp. 117-122
- [4] J. Quan and G. Bai, "A new method to reduce the sidechannel leakage caused by unbalanced capacitances of differential interconnections in dualrail logic styles", 2009 Sixth International Conference on Information Technology: New Generations, DOI 10.1109/ITNG. 2009.185, pp. 58-63.
- [5] M. Bucci, L. Giancane, R. Luzzi, A. Trifiletti: "Three-Phase Dual-Rail Pre-Charge Logic". In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 232–241. Springer, Heidelberg (2006).
- [6] Stanojlović, M. and Litovski, V., "Simulation of defects in sequential NSDDL Master/Slave D flip flop circuit", Proceedings of Small Systems Simulation Symposium 2012, Niš, Serbia, 12th-14th February 2012
- [7] Milovanović, D. B., and Litovski, V. M., "Fault models of CMOS Circuits", Microelectrinics Reliability, Vol. 34, No. 5, pp. 883-896, 1994
- [8] Danger, J.-L. Guilley, S. Bhasin, S. Nassar, M., "Overview of Dual Rail with Precharge Logic Styles to Thwart Implementation-Level Attacks on Hardware Cryptoprocessors", Proc. of International Conference on Signals, Circuits and Systems SCS'2009, Djerba, Tunisia, November 5-8 2009, pp. 1-8
- [9] Stanojlović, M., Petković, P.: "An ASIC cryptoosystem resistant to side channel attacks based on standard cells", VIII Simposium on Industrial Electronics INDEL 2010, Banja Luka, Bosnia and Herzegovina, 4-6 November, 2010, pp. 110-114, ISBN 978-99955-46-03-8, In Serbian
- [10] Petković, P., Stanojlović, M.: "Hardware protection from side channel attacks based on masking the consumption information", Zbornik LV konferencije ETRAN, Banja Vrućica, Teslić, B&H, 2011, ISBN 978-86-80509-66-2.
- [11] ASIC Design Kit, http://www.mentor.com/company/higher\_ed/ic-asic

# FPGA Implementation of AES Algorithm

Ana Krkljić, Branko Dokić, and Velibor Škobić

*Abstract* – This paper deals with FPGA implementation of Advanced Encryption Standard with the key length 128 bits. The QuartusII software is used for synthesis and place and route, hile design was described using hardware description language – VHDL. while simulation of implemented design was done using ModelSim simulation software.

Keywords - AES, FPGA.

#### I. INTRODUCTION

Two basic techniques for encrypting information are: symmetric encryption (also called secret key encryption) and asymmetric encryption (also called public key encryption). Symmetric algorithms are faster, but their main weakness is key distribution. On the other hand, asymmetric encryption overcomes key security problem, but these algorithms are generally slower. Some systems use asymmetric encryption for secure key exchange combined with symmetric algorithms for fast data encryption. One of well-respected symmetric algorithms is AES (Advanced Encryption Standard), AES is encryption standard established by the U.S. National Institute of Standards and Technology (NIST) in 2001, based on Rijndael algorithm.

The algorithm implemented in this paper is Rijndael, named after its authors Joan Daemen and Vincent Rijmen, two Belgian cryptographers. Rijndael is an iterated block cipher with a variable block length and a variable key length. The block length and the key length can be independently specified to 128, 192 or 256 bits [1]. As it became a standard, called AES (Advanced Encryption Standard), the block length was fixed to 128 bits, while the key lengths are as mentioned.

Some papers regarding hardware implementation of AES are [2], [3], [4] and [5]. Tradeoffs in hardware implementations of AES are area efficiency and speed. Throughput achieved in the above mentioned papers is 352 Mbps, 182.86 Mbps, up to 28.4Gbps and 462 Mbps respectively, for AES with 128-bit key.

# II. AES ALGORITHM

This paper considers AES algorithm with the block length and the key length of 128 bits. Data block is organized as 4x4 matrix of 16 bytes, considered as a state on which the following transformations are done:

- SubBytes: a non-linear byte substitution, operating on each of the state bytes independently.
- ShiftRows: the rows of the state are cyclically shifted over different offsets. row 0 is not shifted, row 1 is

shifted over 1 byte, row 2 over 2 bytes and row 3 over 3 bytes.

- MixColumns: columns of the state are multiplied by a predefined matrix of constants.
- AddRoundKey: XOR operation on the round key and the result of the previous transformations

Described transformations are applied to the plaintext in 11 iterations, also called rounds. As depicted by Figure 1, initial round include only AddRoundKey, which means that plaintext is bitwise XORed to the initial round key. Following 9 rounds include all described transformations, while in the last round MixColumn is skipped.



Fig. 1. AES algorithm iterations

The round key is obtained through key expansion, described by following recursive formula:

KeyExpansion(byte Key[16] word[44])

for (i=0; i<4; i++)

W[i]=(Key[4\*i], Key[4\*i+1], Key[4\*i+2], Key[4\*i+3]);

for(i=4; i<43; i++)

{ temp=W[i-1]; if(i%4==0)

temp=SubByte(RotByte(temp) xor Rcon(i/4)); W[i]=W[i-4] xor temp;

}}

More detailed explanation of the formula is given in [1]. Key expansion makes an array of 44 columns representing 11 round keys (4 columns for each round key), for initial round as well as 10 following rounds.

#### **III. IMPLEMENTATION**

Listed transformation were implemented as separate hardware entities and then connected together as depicted by Figure 3.1. The SubBytes transformation was implemented as a look up table, hence it occupies a lot of logic resources, but provides fast encryption. Control logic module is based in a counter modulo 11, which generates a round number. That number is used in RKG module to obtain the round key. RKG means RoundKeyGeneration, which does the key expansion and returns the round key at the output. Control logic also provides appropriate connection between transformation modules. Thus in initial round, SubBytes, Shiftrows and MixColumns are skipped plaintext is connected to ARK module and (AddRoundKey transformation). final In round MixColumn is skipped while in the other rounds all modules are used. MEM block is used to store the value calculated in each round in order to be passed to the SubBytes input in the next round. Output of the MEM block is connected to output bus *ciphertext* only in the final round. In round 11 system does not encrypt any data, it only waits for new data to be ready at the input bus, plaintext. When new data is ready at the input bus, the signal at the ready input of control logic module is asserted. This allows synchronization with previous module provide 128-bit that data а



block.

Fig. 2. Block diagram of AES encryption system

#### IV. SIMULATION AND TESTING

Logic verification of the design and simulation results were obtained using ModelSim simulation software. Firstly the encryption module and the decryption module were simulated separately, using test vectors provided by NIST. These vectors, given in [1] in appendix D, are following:

| Plaintext:  | 3243 | F6A8   | 885A   | 308D   | 3131   | 98A2 | E037   | 0734 |
|-------------|------|--------|--------|--------|--------|------|--------|------|
| Key:        | 2B7E | 1516   | 28AE   | D2A6   | ABF7   | 1588 | 09CF   | 4F3C |
| Ciphertext: | 3925 | 841D ( | )2DC ( | )9FB 1 | DC11 8 | 3597 | 196A ( | DB32 |

#### Fig. 3. The vectors provided by NIST

In this simulation, the key was fixed, without loss of generality. It is written in code, in RoundKeyGeneration as well as INVRoundKeyGeneration entity, rather than making it an input singnal. Figure 4 depicts the simulation of the encryption process. It is seen that, for given plaintext vector at the input, the ciphertext vector was obtained at the output. Similarly, the Figure 5 depicts the simulation of the decryption process, having the *ciphertext* vector at the input and the *plaintext* at the output of the decryption module.





Fig. 5. Logic simulation of the decryption module

The result depicted by Figure 6 is obtained by following. Encryption and decryption module were connected via 128-bit bus and test signal in a sine wave form is applied to the input of the encryption module. Sine wave at the output also verifies the correctness of encryption and decryption.



Fig. 6. Logic simulation of encryption and decryption

Implemented solution was tested on a DE1, Altera's development board. The board features Wolfson WM8731 audio CODEC with line-in, line-out and microphone-in jacks. Sine wave from signal generator is applied to the mic-in and to the oscilloscope channel 1, while oscilloscope channel 2 is connected to the line-out of the board. Input signal is digitized, then encrypted and then serialized. Serial data is transmitted via general IO pins to

the decryption module, in this case on the same FPGA device, as depicted by Figure 7.



Fig. 7. Block diagram of a system tested on the DE1 board

Received serial data is parallelized, decrypted and then converted into analogue form which is observed at the lineout jack. Figure 8 depicts, both, channel 1 and channel 2 of the oscilloscope.



Fig. 8. Results obtained using DE1 board, signal generator and oscilloscope

Implemented system for encryption and decryption occupies 10 488 logic elements, which is 56% logic resources of the *EP2C20F484C7* device. Maximum operating frequency estimated by Quartus II is 132.68 MHz. It takes 11 clock cycles for system to generate the *ciphertext*, so the encryption throughput is as follows:

Throughput [Mbps] = (data block length[bits]) \*

(maximum operating frequency[MHz]) / number of clock cycles = 128\*132.68/11 = 1543 [Mbps]

# IV. CONCLUSION

AES encryption/decryption system is implemented on *EP2C20F484C7*, device from Cyclone II FPGA series of Altera. The QuartusII software is used for hardware description, synthesis and place and route., while simulation of implemented design was done using ModelSim simulation software. System is tested on DE1, Altera's development board. Implemented solution occupies 56% logic elements of the *EP2C20F484C7* device, meaning encryption as well as decryption system on the same device. The system encrypts data at 1543 Mbps rate, for the key length 128 bits. Future development would include an effort to reduce the logic resources utilization, using, for example, embedded memory blocks.

#### ACKNOWLEDGEMENT

This paper is result of undergraduate thesis "*New technologies and families of programmable logic devices*", defended by Ana Krkljić, at Faculty of Electrical Engineering, University of Banjaluka, december 2013

#### REFERENCES

- Joan Daemen and Vincent Rijmen, The Rijndael Block Cipher, Retrieved November 01, 2013, from <u>http://csrc.nist.gov/archive/aes/rijndael/Rijndael-</u> ammended.pdf
- [2] Ghewari, P., Patil, J., Chougule, A., "Efficient Hardware Design and Implementation of AES Cryptosystem", International Journal of Engineering Science and Technology, Vol. 2, 2010, pp. 213-219.
- [3] Mali, M., Novak, F., Biasizzo A., "Hardware Implementation of AES algorithm", Journal of Electrical Engineering, Vol. 56, No. 9-10, 2005, pp. 265-269.
- [4] C.P. Fan J. K. Hwang, "FPGA Implementations of High Throughput Sequentialand Fully Pileplined AES Algorithm ", International Journal of Electrical Engineering, Vol.15, No.6 PP. 447-455 (2008)
- [5] Adib, S., Raissouni, N., "AES Encryption Algorithm Hardware Implementation: Throughput and Area Comparasion of 128, 192 and 256 bits KEY", International Journal of Reconfigurable and Embedded Systems, Vol. 1, No. 2, July 2012, pp. 67-74.

# FPGA Implementation of Montgomery Modular Multiplier

Velibor Škobić, Branko Dokić, and Željko Ivanović

*Abstract* – Montgomery architectures of modular multipliers with one and two bits scanning are described in this paper. Multipliers have been described using hardware description language – VHDL, and implemented on FPGA integrated circuit EP4CE115F29C7. Comparative analysis of multiplier regarding minimum calculation time, maximum operating frequency and number of used logic elements of integrated circuit is given. Based on implemented modules, analysis of RSA module for data encryption is performed.

*Keywords* – Modular multiplication, Montgomery algorithm, RSA algorithm

#### I. INTRODUCTION

Data protection can be achieved using symmetric and asymmetric algorithms. Encryption procedure using symmetric algorithms occurs by substitution, transposition, shifting, as well as logic operations (XOR) over data bytes (AES). These operations are much simpler for hardware implementation, which leads to lower number of resources and higher operating speed of encryption module. Symmetric algorithms are mostly used for data encryption. Asymmetric algorithms are for one order of magnitude slower than symmetric ones. They are used for keys exchange and digital signatures. Data protection occur using quite demanding mathematical operations. One of the most commonly used asymmetric algorithms is RSA algorithm [1]. Data protection procedure occurs in a way that key message (P) is exponentiated onto public key (e), and then determine the remaining of dividing operation with public key (m). For hardware realization of RSA algorithm, binary algorithm of modular exponentiation is typically used [2, 3]. Using binary algorithm, modular exponentiation procedure is reduced to iterative modular multiplication (A·Bmodm). Some of modular multiplication algorithms are described in [2]. One of the most effective algorithms is Montgomery algorithm [4]. Calculation procedure is carried out by iterative summation. This algorithm is highly efficient and very simple for hardware implementation. It is used in algorithms with a large number of modular multiplications.

In second chapter, fundamentals of Montgomery algorithm modular multiplication are given. Two ways of Montgomery modular multiplication are described: with scanning of one bit and two bits. Two RSA data encryption algorithms using Montgomery modular multiplier have been described. In third chapter, hardware realization of Montgomery module is described. Fourth chapter presents simulation results of implemented modules. Results are summarized in the conclusion.

# II. MONTGOMERY ALGORITHM

#### A. Montgomery algorithm

Montgomery algorithm is efficient and simple for hardware implementation. The result of modular multiplication is given by the following equation:

$$MonPro(A, B, m) = A \cdot B \cdot R^{-1} \mod m \qquad (1)$$

The advantage of this algorithm is that calculation is performed without dividing with m, but dividing with number R. Number R is taken in the form of  $2^k$ , where k is the number of bits needed to represent input data. Number  $R^{-1}$  is inverse number of number R modulo m. For hardware implementation, dividing with  $2^k$  is simple shifting operation of k bits to the right. As shown in equation (1), the result of modular multiplication contains number  $R^{-1}$ . This number can be eliminated so that input data A and Bconvert to leftover system modulo m (A=MonPro(A,R,m), B=MonPro(B,R,m)), and then the result of modular multiplication D converts so that multiplies with number 1 (D=Monpro(D,1,m)). As a result, we have D= $A \cdot B \mod m$ . Montgomery algorithm modular multiplication is given by the following pseudo code:

Result 
$$D=(A \cdot B) \mod m$$
  
1.  $D=0$   
2. from  $i=0$  to  $i=k-1$   
a.  $D = D + A \cdot B_i$   
b.  $D = (D + D(0) \cdot m)/2$   
3. output  $D$   
Listing 1

Number k is the number of bits used for data representation. This algorithm passes through k iterations, where k is the number of bits used for representation of numbers A, B and m. Value of bits of number B ( $b_i$ ) has been scanned. Depending on the value of scanned bit  $b_i$  number A is added to number D. Then, in order to perform reduction with 2 in every iteration, if the current result D is odd, number m (m is a prime number in RSA algorithm) is

added. If not, zero is added. For this realization, two adders, one shift register and control logic is needed.

Number of iterations can be reduced by scanning two bits of number  $B(b_i, b_{i+1})$  in one iteration. Now, number of cycles needed for calculation is halved (g=k/2). Based on this, Montgomery algorithm pseudo code can be written as follows:

Result 
$$D=(A \cdot B) \mod m$$
  
1.  $D=0$   
2. from  $i=0$  to  $i=g-1$   
a.  $D = D + A \cdot B_i$   
b.  $D = (D + D(0) \cdot m)/2$   
c.  $D = D + A \cdot B_{i+1}$   
e.  $D = (D + D(0) \cdot m)/2$   
3. output  $D$   
Listing 2

Summation, depending of value of scanned bits  $b_i$ ,  $b_{i+1}$ , can be grouped in one equation, as well as the condition for summation with number *m*. At the end, result is divided by 4, i.e. content is shifted two positions to the right. Based on these transformations, following algorithm is formed:

Result 
$$D=(A \cdot B) \mod m$$
  
1.  $D = 0$   
2. from  $i=0$  to  $i=g-1$   
a.  $D = D + A \cdot B_i + 2A \cdot B_{i+1}$   
b.  $D = (D + U_1 \cdot m + U_2 \cdot 2m)$   
c.  $D = D/4$   
3. output  $D$   
Listing 3

Condition  $U_1$  for summation of number *m* with result *D* has been defined using the equation:

$$U_1 = b_i A(0) \oplus D(0) \tag{2}$$

while condition  $U_2$  for summation of number 2m with result is defined with:

$$U[1,0] = D[1,0] + b_i \cdot A[1,0] + + U_1 \cdot M[1,0] + b_{i+1} \cdot 2 \cdot A[1,0]$$
(3)  
$$U_2 = U(1)$$

First algorithm that scans only one bit  $b_i$  passes through k iterations. Second algorithm that scans two consecutive bits  $b_i$  and  $b_{i+1}$  passes through k/2 iterations. In the first one, summation is performed with numbers A and m, depending on current result value and value of bit  $b_i$ . Second one, in

one iteration adds numbers A, 2·A, m and 2·m, depending on the state of conditions  $U_1$ ,  $U_2$  and values of  $b_i$  and  $b_{i+1}$ .

#### B. Carry Save Adders

For implementation of Montgomery algorithm modular multiplication, special attention is paid to the implementation of the adder. Better performances regarding speed can be adjusted using CSA (*Carry Save Adders*). CSA has three input and two output vectors. Summation result consists of vector C and vector S, represented in redundant form. Vector C represents carry bit vector, while vector S is vector of the current bit sum. The summation result is:

$$(C,S) = X + Y + Z \tag{4}$$

Carry bit in CSA realization does not propagate through full adders, but it's remembered in the shape of vector *C*. Propagation time of carry bit is eliminated, and hence result waiting time has been reduced. In order to get final result, vectors *C* and *S* needs to be summated using full adders, e.g. RCA (*Ripple Carry Adder*).

#### C. RSA algorithm

Montgomery modular multipliers are used for implementation of RSA module for data protection. Data encryption using RSA algorithm is performed as follows:

$$C = P^e \mod m$$

where P is data that needs to be encrypted, e and m are public keys, and C encrypted data. Data decryption is performed as follows:

$$P = C^d \mod m$$

where the pair of numbers d and m is a private key. Encryption and decryption procedures are very similar. For encryption, public key e has been used as exponent, and private key d for decryption. For hardware implementation of modular exponentiation, method of exponent bit scanning is suitable [2]. Depending of the direction of scanning, there are two methods: scanning from left to right and from right to left. Algorithm from right to left is shown by the following pseudo code [3]:

From right to left Result  $C = P^e \mod m$ 1.  $K = 2^{2n} \mod m$ 2. Z = Monpro(1, K, M)3. P = Monpro(P, K, m) 4. from *i* = 0 to *i* = *k*-1

a. if *e<sub>i</sub>* = 1 then *Z* = *Monpro*(*Z*, *P*, *m*)
b. *P* = *Monpto*(*P*, *P*, *m*)

5. *Z* = *Monpro*(1, *Z*, *m*)
6. *C* = *Z*Listing 4

Algorithm from left to right is presented as [3]:

From left to right Result  $C = P^e \mod m$ 1.  $K = 2^{2n} \mod m$ 2. Z = Monpro(1, K, M)3. P = Monpro(P, K, m)4. from i = k - 1 to i = 0a. Z = Monpto(Z, Z, m)b. if  $e_i = 1$  then Z = Monpro(Z, P, m)5. Z = Monpro(1, Z, m)6. C = ZListing 5

For right to left algorithm, two Montgomery modular multipliers are needed working in parallel, while for left to right algorithm, only one sequentially operated multiplier is needed. Algorithm from left to right saves the number of Montgomery multipliers, while the number of calculation iterations doubles. Number of cycles for right to left algorithm is k+3, and for left to right algorithm 2(k+3).

#### **III. HARDWARE IMPLEMENTATION**

Figure 1 shows Montgomery modular multiplier architecture with one bit scanning (Listing 1). It consists of  $C_{sig}$  and  $S_{sig}$  registers, shift register, multiplexer for signal routing, CSA network and control logic. Role of C\_sig and S\_sig registers is to temporary memorize the current result. Shift register on its output generate bit  $b_i$ . Signals 0 and A are routed depending of the value of bit  $b_i$ towards CSA network input. CSA network consists of two CSA. Inputs of CSA network are C\_sig and S\_sig signals, and output signals of first and second multiplexer. CSA network output signal states are memorized in C\_sig and  $S_{sig}$  registers. Control logic controls the operation of the shift register, generate signal  $U_1$  and controls drives the state of C\_sig and S\_sig registers. For the final result to come, k+2 clock cycles are needed. In first clock cycle, registers C\_sig and S\_sig are reset and data B is written into shift register. Through next k cycles, summation defined by loop inside Listing 1 is performed. At k+2 clock cycle, result of modular multiplication (C sig and S sig) is converted to normal form (D) using full adder.



Fig. 1. Architecture of Montgomery modular multiplier with one bit scanning

Figure 2 shows architecture of Montgomery modular multiplier with two bits scanning (Listing 3). It consists of  $C\_sig$  and  $S\_sig$  registers, shift register, multiplexer for signal routing, CSA network and control logic. Role of registers  $C\_sig$  and  $S\_sig$  is to temporary memorize current result. Shift register on its output generate bites  $b_i$  and  $b_{i+1}$ . In dependence of bits  $b_i$  and  $b_{i+1}$  value, signals 0, A and 2A are routed towards CSA network input. Depending of the state of signal  $U_1$  and  $U_2$ , signals 0, m and 2m are routed towards CSA network over multiplexer MUX2.



Fig. 2. Architecture of Montgomery modular multiplier with two bits scanning

This CSA network architecture consists of four CSA adders. CSA network inputs are  $C\_sig$  and  $S\_sig$  signals, and output signals of first and second multiplexer. Output

signals of CSA network is then memorized into registers  $C\_sig$  and  $S\_sig$ . Control logic control shift register, generates  $U_1$  and  $U_2$  signals and control operating states of  $C\_sig$  and  $S\_sig$  registers. In order to obtain the final result, k/2+2 clock cycles are needed. In the first clock cycle, registers  $C\_sig$  and  $S\_sig$  are reset and data *B* is written into the shift register. During the next k/2 cycles, summation defined by loop inside Listing 3 is performed. At k/2+2 clock cycle, result of modular multiplication ( $C\_sig$  and  $S\_sig$ ) is converted in normal form (D) using full adder.

Based on presented architectures, two modules for Montgomery modular multiplication have been implemented. The first one is with one bit scanning *Montgomery\_b1*, and second one with two bits scanning *Montgomery\_b2*. Modules are described using hardware description language (VHDL) and synthesized by Quartus II and ModelSim software packages. Analyze of number of required resources for module implementation, maximum operating frequency and minimum calculation time has been performed.

Using implemented Montgomery modules, analyze of RSA modules for data encryption is carried out. Based on RSA algorithm right to left and left to right with Montgomery modular multipliers, encryption modules RSA\_Montgomery\_rl\_b1, RSA\_Montgomery\_rl\_b2, RSA\_Montgomery\_lr\_b1 and RSA\_Montgomery\_lr\_b2 are implemented. These modules are analyzed in terms of number of used resources and maximum data encryption speed.

# **IV. RESULTS**

Implementation of the Montgomery modular multipliers and RSA moduls on FPGA integrated circuit EP4CE115F29C7, family Cyclone IV, Altera [5] is done in this paper. This component contains 266 embedded multipliers (18x18 bits), 4 PLL blocks, 3888 Kbits of embedded memory, 528 I/O pins and 114480 logic elements. Preference for FPGA circuit is caused by availability, ease of system testing, flexibility, relatively good performances in the means of speed and power consumption.

| 🧢 N –  | 16   | 16 |     |    |    |   |   |   |            |   |     |     |     |     |      |     |              |      |      |   |
|--------|------|----|-----|----|----|---|---|---|------------|---|-----|-----|-----|-----|------|-----|--------------|------|------|---|
| 🍝 clk  | 1    |    |     | ιF | IN | h | Π | h | п          | h | П   | ٦.  | П   | h   | n    | h   | Л            | h    | П    | F |
| 🅉 rese | t 1  |    |     | -  |    |   |   | - |            |   |     |     |     |     |      |     |              |      |      | F |
| 🍎 A 👘  | 688  | 68 | 8   |    |    |   |   |   |            |   |     |     |     |     |      |     |              |      |      |   |
| 🗼 В 👘  | 640  | 64 | 0   |    |    |   |   |   |            |   |     |     |     |     |      |     |              |      |      |   |
| м      | 3337 | 33 | 37  |    |    |   |   |   |            |   |     |     |     |     |      |     |              |      |      |   |
| 🔖 go - | 0    |    |     |    |    |   |   |   |            |   |     |     |     |     |      |     |              |      |      | L |
| 🐳 i 👘  | 3    | 0  | )(1 | 2  | )3 | 4 | 5 | 6 | <b>)</b> 7 | 8 | (9  | 10  | )11 | 12  | 13   | 14  | (15          | 16   | 17   | 0 |
| 🔷 bi 🛛 | 0    |    |     |    |    |   |   |   |            |   | 1   |     | 1   |     |      |     |              |      |      |   |
| 🔶 S_si | g 0  | 0  |     |    |    |   |   |   |            |   | 344 | 172 | 430 | 215 | 1775 | 886 | <b>)</b> 440 | 222  | 111  |   |
| اى_> 🔶 | g 0  | 0  |     |    |    |   |   |   |            |   |     |     |     |     | X1   | 2   | <u>)</u> 4   | lo 🛛 |      |   |
| 🔶 D –  | 0    | 0  |     |    |    |   |   |   |            |   | 344 | 172 | 430 | 215 | 1776 | 888 | (444         | 222  | )111 |   |

Fig. 3. Signal waveforms of Montgomery\_b1 module



Figures 3 and 4 shows results of *Montgomery\_b1* and *Montgomery\_b2* modules, respectively. For input signal values such as: A=688, B=640 and M=3337, where k=16, result of Montgomery modular multiplication is the number D=111.

Figure 5 presents result of logic resources analysis needed for implementation of previously mentioned modules. *Montgomery\_b1* module occupy less resources comparing to *Montgomery\_b2* module, as a consequence of smaller CSA network and logic for signal routing.



Fig. 5. Number of used logic elements as a function of data length

Table 1 shows module analysis results regarding maximum operating frequency. Based on given results, maximum operating frequency is obtained for *Montgomery\_b1* module, at various data length. This is a consequence of lower signal propagation time through CSA network, consisted of two CSA adders.

TABLE I Maximum operating frequency [MHz]

| k    | Montgomery_b1 | Montgomery_b2 |
|------|---------------|---------------|
| 16   | 323.94        | 160.77        |
| 32   | 188.04        | 159.08        |
| 64   | 206.83        | 157.65        |
| 128  | 234.03        | 152.86        |
| 256  | 252.4         | 152.84        |
| 512  | 239.87        | 145.31        |
| 1024 | 298.33        | 126.2         |
| 2048 | 173.19        | 126.98        |

Multiplying period of maximum operating frequency (Table I) with number of clock cycles used for processing one data, we get minimum calculation time for one data as

a function of data length. Those results are shown on Fig. 6. Lower calculation time is obtained for *Montgomery\_b2* implementation. Maximum operating frequency of *Montgomery\_b2* module is slightly lower then for *Montgomery\_b1* module, but the number of clock cycles needed for calculation is twice lower, whereby better results are obtained with respect to calculation time.



Fig. 6. Minimum calculation time as a function of data length

Analyzing RSA module for data encryption, following results are obtained. On Fig. 7, number of logic elements of implemented modules as a function of data length is shown.



Fig. 7. Number of used logic elements as a function of data length



length

Least number of logic elements occupies implementation of left to right algorithm with Montgomery modular multiplier one bit scanning. Most resources have been used by implementation of right to left algorithm with Montgomery modular multiplier two bits scanning. Fig. 8 presents results of module analysis in terms of maximum data encryption speed. Best results have been obtained by right to left algorithm and Montgomery modular multiplier with two bits scanning. For key length of 1024 bits, maximum operating frequency is 55.32 [kb/s].

# **IV. CONCLUSION**

Multipliers are implemented on Altera's Cyclone FPGA integrated circuit EP4CE115F29C7. family Synthesis and simulation are performed by Quartus II and ModelSim software packages. Number of used logic elements depends of data length and is higher for multiplies architectures with two bits scanning. For example, for data length of k = 128 bits it is a 55% increase, and for k = 1024bits is 56%. Maximum operating frequency and minimum calculation time also depends of data length. Modular multiplier with one bit scanning has higher operating frequency, and lower calculation time. Maximum frequency of *Montgomery\_b1* module is in the range about 324 MHz for k=16 up to 188 MHz for k=1024 bits. For Montgomery b2 module, this frequency is in the range of 161 MHz up to 126 MHz. Calculation time is in the range of  $0.05\mu s$  (k=16 bits) up to  $5.45\mu s$  (k=1024 bits) for Montgomery\_b1, and from  $0.06\mu s$  up to  $4.07\mu s$  for Montgomery b2. calculation Average time in Montgomery\_b2 implementation is decreased for about 23% comparing to Montgomery\_b1 implementation. Data encryption speed is highest for Montgomery\_rl\_b2 implementation. For key length of k=1024 bits, maximum encryption speed is 55.32 kb/s, and number of used logic elements is 36960. The lowest number of used logic elements is at *Montgomery\_lr\_b1* implementation. For key length of k=1024 bits, number of used logic elements is 22649, and maximum encryption speed 26.28 kb/s.

#### REFERENCES

- R. L. Rivest, A. Shamir, L. Adleman, "A Method For Obtaining Digital Signatures And Public-Key Crypto Systems," Communications of the ACM, vol. 21, no. 2, pp. 120-126, Feb., 1978.
- [2] C. K. Koc. "RSA Hardware Implementation". TR 801, RSA Laboratories, April 1996.
- [3] V. Škobić, B. Dokić, Ž. Ivanović. "FPGA Implementacija RSA algoritma," Proceedings of 57th ETRAN Conference, Zlatibor, Serbia, June 3-6, 2013, pp.EL3.8.1-5.
- [4] P. L. Montgomery, "Modular Multiplication Without Trial Division," Mathematics of Computation, vol. 44, no. 170, pp. 519-521, Abbrev. Apr., 1985.
- [5] "Cyclone IV EP4CE115F29C7 Data Sheets," http://www.altera.com.

# List of Authors

| 1.  | Andrejević Stošović, M. | 43, 113          |
|-----|-------------------------|------------------|
| 2.  | Aleksić, S.             | 59               |
| 3.  | Andonova, A.            | 51               |
| 4.  | Angelov, A.             | 51               |
| 5.  | Babayan, V.             | 16,20            |
| 6.  | Baghdasaryan, A.        | 901              |
| 7.  | Bojanić, S.             | 113              |
| 8.  | Damnjanović, M.         | 119              |
| 9.  | Dimitrijević, M.        | 43, 113          |
| 10. | Dimitrijević, T.        | 47               |
| 11. | Dimitrovski, A.         | 108              |
| 12. | Dingchyan, H.           | 20               |
| 13. | Djinevski, L.           | 65               |
| 14. | Dokić, B.               | 129, 132         |
| 15. | Dončov, N.              | 47               |
| 16. | Đorđević, G.S.          | 85               |
| 17. | Drača, D.               | 69               |
| 18. | Filiposka, S.           | 65               |
| 19. | Grigoryants, V.         | 20               |
| 20. | Grujić, D.              | 55               |
| 21. | Hayrapetyan, A.         | 20               |
| 22. | Hristov, M.             | 51               |
| 23. | llić, M.                | 102              |
| 24. | Ivanović, Ž.            | 132              |
| 25. | Joković, J.             | 47               |
| 26. | Jovanović, B.           | 119              |
| 27. | Jovanović, P.           | 55               |
| 28. | Kazmierski <i>,</i> T.  | 95               |
| 29. | Krčum, D.               | 11, 55           |
| 30. | Krkljić, A.             | 129              |
| 31. | Leech, L.               | 95               |
| 32. | Litovski, V.            | 30, 43, 113, 123 |
| 33. | Marković, V.            | 24               |
| 34. | Martirosyan, A.         | 20               |
| 35. | Melikyan, V.            | 16, 20           |
| 36. | Milić, M.               | 30               |
| 37. | Milovanović, B.         | 47               |
| 38. | Milovanović, D.         | 37               |
| 39. | Mirković, D.            | 37               |
| 40. | Mishkovski, I.          | 65               |
| 41. | Mišić, J.               | 24               |
| 42. | Nieto, O.               | 113              |

| 43. | Panajotović, A. | 69              |
|-----|-----------------|-----------------|
| 44. | Pantić, D.      | 59              |
| 45. | Pantić, D.      | 59              |
| 46. | Petković, M.    | 85              |
| 47. | Petković, P.    | 37, 74, 123     |
| 48. | Petrović, V.    | 102             |
| 49. | Petrušić, A.    | 79              |
| 50. | Petrušić, Z.    | 79              |
| 51. | Poghosyan M. S. | 16              |
| 52. | Sahakyan S. A.  | 16,20           |
| 53. | Sahakyan S. A.  | 16              |
| 54. | Šaranovac, L.   | 11              |
| 55. | Savić, M.       | 11,55           |
| 56. | Schoof, G.      | 102             |
| 57. | Sekulović, N.   | 69              |
| 58. | Škobić, V.      | 129, 132        |
| 59. | Spasova, M.     | 51              |
| 60. | Stamenković, Z. | 102             |
| 61. | Stanojlović, M. | 123             |
| 62. | Stevanović, D.  | 74              |
| 63. | Takov, T.       | 51              |
| 64. | Todorović, D.   | 85              |
| 65. | Trajanov, D.    | 65 <i>,</i> 108 |
| 66. | Zdraveski, V.   | 108             |
| 67. | Zerbe, V.       | 74              |